Publications
Proceedings, refereed
Are Cloud Platforms Ready for Multi-Cloud?
In The European Conference on Service-Oriented and Cloud Computing (ESOCC). Cham, Switzerland: Springer, 2020.Status: Published
Are Cloud Platforms Ready for Multi-Cloud?
Afilliation | Communication Systems |
Project(s) | MELODIC: Multi-cloud Execution-ware for Large-scale Optimised Data-Intensive Computing |
Publication Type | Proceedings, refereed |
Year of Publication | 2020 |
Conference Name | The European Conference on Service-Oriented and Cloud Computing (ESOCC) |
Pagination | 56-73 |
Publisher | Springer |
Place Published | Cham, Switzerland |
Proceedings, refereed
Building an Open-Source Cross-Cloud DevOps stack for a CRM Enterprise Application: A Case Study
In IFIP International Conference on Open Source Systems. Springer, 2019.Status: Published
Building an Open-Source Cross-Cloud DevOps stack for a CRM Enterprise Application: A Case Study
Open Source software solutions play a critical role for the SMEs by enabling easy access to reusable software. Also, with the rapid growth in the popularity of the cloud technologies, computational demands of SMEs are cost-efficiently met by the public clouds as users can dynamically acquire resources on demand according to their needs. However, non-standardized cloud interfaces, lack of inter-cloud transparency, and complex cost models, often result in vendor lock-in. Once in vendor lock-in, cloud users have to live with a single cloud provider and accept whatever pricing schemes and SLAs are imposed. Moreover, new regulations covered by the General Data Protection Regulation (GDPR) in Europe require companies to enforce policies regarding secure storage of data in the cloud, as well as restrict moving confidential datasets outside Europe. This situation requires a more transparent use of cloud resources from multiple cloud providers, that conform with user’s data privacy needs, service requirements, and budget.
In this paper, we discuss and pitfalls of designing a Cross-Cloud
Afilliation | Communication Systems |
Project(s) | MELODIC: Multi-cloud Execution-ware for Large-scale Optimised Data-Intensive Computing |
Publication Type | Proceedings, refereed |
Year of Publication | 2019 |
Conference Name | IFIP International Conference on Open Source Systems |
Pagination | 3-11 |
Date Published | 05/2019 |
Publisher | Springer |
Data Center Clustering for Geographically Distributed Cloud Deployments
In International Workshop on Recent Advances for Multi-Clouds and Mobile Edge Computing (M²EC 2019) in conjunction with the 33rd International Conference on Advanced Information Networking and Applications (AINA) . Matsue, Japan: Springer, 2019.Status: Published
Data Center Clustering for Geographically Distributed Cloud Deployments
Geographically-distributed application deployments are critical for a variety of cloud applications, such as those employed in the Internet-of-Things (IoT), edge computing, and multimedia. However, selecting appropriate cloud data centers for the applications, from a large number of available locations, is a difficult task. The users need to consider several different aspects in the data center selection, such as inter-data center network performance, data transfer costs, and the application requirements with respect to the network performance. This paper proposes a data center clustering mechanism to group befitting cloud data centers together in order to automate data center selection task as governed by the application needs. Employing our clustering mechanism, we present four different types of clustering schemes, with different importance given to available bandwidth, latency, and cloud costs between data centers. The proposed clustering schemes are evaluated using a large number of data centers from two major public clouds, Amazon Web Services, and Google Cloud Platform. The results, based on a comprehensive empirical evaluation of the quality of clusters, show that the proposed clustering schemes are very effective in optimizing data center selection as per the application requirements.
Afilliation | Communication Systems |
Project(s) | MELODIC: Multi-cloud Execution-ware for Large-scale Optimised Data-Intensive Computing |
Publication Type | Proceedings, refereed |
Year of Publication | 2019 |
Conference Name | International Workshop on Recent Advances for Multi-Clouds and Mobile Edge Computing (M²EC 2019) in conjunction with the 33rd International Conference on Advanced Information Networking and Applications (AINA) |
Publisher | Springer |
Place Published | Matsue, Japan |
Mobile Edge as Part of the Multi-Cloud Ecosystem: A Performance Study
In Proceedings of the 27th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP). Pavia, Lombardia/Italy: IEEE Computer Society, 2019.Status: Published
Mobile Edge as Part of the Multi-Cloud Ecosystem: A Performance Study
Cloud computing has revolutionized the way of application usage and deployment: applications run cost-effectively in remote data centers. With the increasing need for mobility and micro-services, particularly with the upcoming 5G mobile broadband networks, there is also a strong demand for mobile edge computing (MEC): applications run in small cloud systems in close proximity to the user, in order to minimize latencies. Both cloud and MEC have their advantages and disadvantages. Combining the two approaches in a unified multi-cloud, consisting of both traditional cloud services provisioned over heterogeneous cloud platforms and MEC systems, has the potential of obtaining the best out of both worlds. However, a comprehensive study is needed to evaluate the performance gains and the overheads involved for real-world cloud applications. In this paper, we introduce a baseline performance evaluation in order to identify the fallacies and pitfalls of combining multiple cloud systems and MEC into a unified MEC-multi-cloud platform. For this purpose, we analyze the basic, application-independent performance metrics of average round-trip time (RTT) and average application payload throughput in a setup consisting of two private and one public cloud systems. This baseline performance analysis confirms the feasibility of MEC-multi-cloud, and provides guidelines for designing an autonomic resource provisioning solutions, in terms of an extension proposed to our existing Melodic middleware platform for multi-cloud applications.
Afilliation | Communication Systems |
Project(s) | MELODIC: Multi-cloud Execution-ware for Large-scale Optimised Data-Intensive Computing, NorNet, The Center for Resilient Networks and Applications, Simula Metropolitan Center for Digital Engineering |
Publication Type | Proceedings, refereed |
Year of Publication | 2019 |
Conference Name | Proceedings of the 27th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP) |
Pagination | 59-66 |
Date Published | 02/2019 |
Publisher | IEEE Computer Society |
Place Published | Pavia, Lombardia/Italy |
ISBN Number | 978-1-7281-1644-0 |
Keywords | Cloud computing, latency, Mobile edge computing, Multi-Cloud, Performance |
DOI | 10.1109/PDP.2019.00017 |
Towards Realistic Simulations of Arbitrary Cross-Cloud Workloads
In The International Workshop on Recent Advances for Multi-Clouds and Mobile Edge Computing (M2EC) held in conjunction with 33rd International Conference on Advanced Information Networking and Applications (AINA 2019). Springer Link, 2019.Status: Published
Towards Realistic Simulations of Arbitrary Cross-Cloud Workloads
Over the last few years, Cloud computing has established itself as a popular computing paradigm. Thanks to the ease of deployments, elastic resource provisioning, high-availability, and an attractive pay-as-you-go economic model, clouds offer significant advantages over traditional cluster computing architectures. More recently, Multi-Cloud solutions have also been explored to take advantage of the most suitable public cloud offerings as well as to tackle vendor lock-in.
The research undertakings in cloud often require designing new algorithms, techniques, and solutions requiring large-scale cloud deployments for comprehensive evaluation. Simulations make a powerful and cost-effective tool for testing, evaluation, and repeated experimentation for new cloud algorithms. Unfortunately, even though cloud federation and hybrid cloud simulations are explored in the literature, Cross-Cloud simulations are still largely an unsupported feature in most popular cloud simulation frameworks. In this paper, we present a Cross-Cloud simulation framework, which makes it possible to test scheduling and reasoning algorithms on Cross-Cloud deployments with arbitrary workload. The support of Cross-Cloud simulations, where individual application components are allowed to be deployed on different cloud platforms, can be a valuable asset in selecting appropriate mixture of cloud services for the applications. We also implement a Cross-Cloud aware reasoner using our Cross-Cloud simulation framework. Simulations using both simple applications and complex multi-stage workflows show that the Cross-Cloud aware reasoner can substantially save cloud usage costs for most multi-component cloud applications.
Afilliation | Communication Systems |
Project(s) | MELODIC: Multi-cloud Execution-ware for Large-scale Optimised Data-Intensive Computing |
Publication Type | Proceedings, refereed |
Year of Publication | 2019 |
Conference Name | The International Workshop on Recent Advances for Multi-Clouds and Mobile Edge Computing (M2EC) held in conjunction with 33rd International Conference on Advanced Information Networking and Applications (AINA 2019) |
Pagination | 1020-1029 |
Publisher | Springer Link |
Talks, invited
Data-Intensive Computing on Cross-Clouds
In Gjøvik, Norway, 2019.Status: Published
Data-Intensive Computing on Cross-Clouds
Clouds offer significant advantages over traditional cluster computing architectures including flexibility, high-availability, ease of deployments, and on-demand resource allocation - all packed up in an attractive pay-as-you-go economic model for the users. However, cloud users are often forced into vendor lock-in due to the use of incompatible APIs, cloud-specific services, and complex pricing models used by the cloud service providers (CSPs). Cloud management platforms (CMPs), supporting hybrid and multi-cloud deployments, offer an answer by providing a unified abstract interface to multiple cloud platforms. Nonetheless, modelling applications to use multi-clouds, automated resource selection based on the user requirements from various available CSPs, cost optimization, security, and runtime adaptation of deployed applications and services still remain a challenge.
In this talk, I'll give an introduction to Melodic, which is a middleware platform for Cross-Cloud data-intensive applications. The Melodic platform enables data-intensive applications to run within defined security, cost, and performance boundaries seamlessly on geographically distributed and federated cloud infrastructures. Melodic thereby realizes the potential of heterogeneous cloud environments for big data and data-intensive applications by transparently taking advantage of distinct characteristics of available private and public clouds, dynamically optimize resource utilization, consider data locality, conform to the user’s privacy needs and service requirements, and counter vendor lock-in.
Afilliation | Communication Systems |
Project(s) | MELODIC: Multi-cloud Execution-ware for Large-scale Optimised Data-Intensive Computing |
Publication Type | Talks, invited |
Year of Publication | 2019 |
Location of Talk | Gjøvik, Norway |
Type of Talk | NTNU CCIS Seminar |
Miscellaneous
Tutorial: Good Bye Vendor Lock-in: Getting your Cloud Applications Multi-Cloud Ready!
The 19th IEEE/ACM International Symposium in Cluster, Cloud, and Grid Computing (CCGrid): ACM IEEE, 2019.Status: Published
Tutorial: Good Bye Vendor Lock-in: Getting your Cloud Applications Multi-Cloud Ready!
Clouds offer significant advantages over traditional cluster computing architectures including flexibility, high-availability, ease of deployments, and on-demand resource allocation - all packed up in an attractive \emph{pay-as-you-go} economic model for the users. However, cloud users are often forced into vendor lock-in due to the use of incompatible APIs, cloud-specific services, and complex pricing models used by the cloud service providers (CSPs). Cloud management platforms (CMPs), supporting hybrid and multi-cloud deployments, offer an answer by providing a unified abstract interface to multiple cloud platforms. Nonetheless, modelling applications to use multi-clouds, automated resource selection based on the user requirements from various available CSPs, cost optimization, security, and runtime adaptation of deployed applications and services still remain a challenge.
In this tutorial, we provide a practical introduction to the multi-cloud application modelling, configuration, deployment, and adaptation. We survey existing CMPs, compare their features, modelling methods, and, not the least, provide a practical hands-on training for getting your applications ready for the multi-clouds using selected tools. By the end of this tutorial, attendees should be able to understand various tools and technologies available for the multi-clouds, and prepared to spin-off their first multi-cloud ready application.
Afilliation | Communication Systems |
Project(s) | MELODIC: Multi-cloud Execution-ware for Large-scale Optimised Data-Intensive Computing |
Publication Type | Miscellaneous |
Year of Publication | 2019 |
Publisher | ACM IEEE |
Place Published | The 19th IEEE/ACM International Symposium in Cluster, Cloud, and Grid Computing (CCGrid) |
Journal Article
A Self-Adaptive Network for HPC Clouds: Architecture, Framework, and Implementation
IEEE Transactions on Parallel and Distributed Systems 29, no. 12 (2018): 2658-2671.Status: Published
A Self-Adaptive Network for HPC Clouds: Architecture, Framework, and Implementation
Clouds offer flexible and economically attractive compute and storage solutions for enterprises. However, the effectiveness of cloud computing for high-performance computing (HPC) systems still remains questionable. When clouds are deployed on lossless interconnection networks, like InfiniBand (IB), challenges related to load-balancing, low-overhead virtualization, and performance isolation hinder full potential utilization of the underlying interconnect. Moreover, cloud data centers incorporate a highly dynamic environment rendering static network reconfigurations, typically used in IB systems, infeasible. In this paper, we present a framework for a self-adaptive network architecture for HPC clouds based on lossless interconnection networks, demonstrated by means of our implemented IB prototype. Our solution, based on a feedback control and optimization loop, enables the lossless HPC network to dynamically adapt to the varying traffic patterns, current resource availability, workload distributions, and also in accordance with the service provider-defined policies. Furthermore, we present IBAdapt, a simplified ruled-based language for the service providers to specify adaptation strategies used by the framework. Our developed self-adaptive IB network prototype is demonstrated using state-of-the-art industry software. The results obtained on a test cluster demonstrate the feasibility and effectiveness of the framework when it comes to improving Quality-of-Service compliance in HPC clouds.
Afilliation | Communication Systems |
Project(s) | ERAC: Efficient and Robust Architecture for the Big Data Cloud |
Publication Type | Journal Article |
Year of Publication | 2018 |
Journal | IEEE Transactions on Parallel and Distributed Systems |
Volume | 29 |
Issue | 12 |
Pagination | 2658-2671 |
Publisher | IEEE |
DOI | 10.1109/TPDS.2018.2842224 |
Efficient Routing and Reconfiguration in Virtualized HPC Environments with vSwitch-enabled Lossless Networks
Concurrency and Computation: Practice and Experience 31, no. 2 (2018).Status: Published
Efficient Routing and Reconfiguration in Virtualized HPC Environments with vSwitch-enabled Lossless Networks
To meet the demands of communication-intensive workloads in the cloud, virtual machines (VMs) should utilize low overhead network communication paradigms. In general, such paradigms enable VMs to directly communicate with the hardware by means of a passthrough technology like Single-Root I/O Virtualization (SR-IOV). However, when passthrough-based virtualization is coupled with lossless interconnection networks, live-migrations introduce scalability challenges due to the substantial network reconfiguration overhead. With these challenges in mind we proposed a virtual switch (vSwitch) SR-IOV architecture for InfiniBand in (33). In this paper, we first suggest solutions to rectify the space-domain scalability issues that are present in vSwitch-enabled subnets as a result of the VMs using dedicated layer-two addresses. Then we discuss routing strategies for virtualized environments using vSwitches, and present a routing algorithm for Fat-Trees. We also present a reconfiguration method that minimizes imposed reconfiguration overhead on Fat-Trees. We perform an extensive evaluation of our prototype algorithms, and as vSwitch-enabled hardware does not yet exist, we deduce from empirical observations by emulating vSwitches with existing hardware, as well as large-scale simulations. Our results show significant reduction in the reconfiguration times as route recalculations can be eliminated, and for certain scenarios, the number of reconfiguration subnet management packets sent to switches is reduced from several hundred thousand down to a single one without degrading the routing quality.
Afilliation | Communication Systems |
Project(s) | ERAC: Efficient and Robust Architecture for the Big Data Cloud |
Publication Type | Journal Article |
Year of Publication | 2018 |
Journal | Concurrency and Computation: Practice and Experience |
Volume | 31 |
Issue | 2 |
Date Published | 02/2018 |
Publisher | John Wiley & Sons |
Keywords | InfiniBand, Lossless Interconnection Networks, Network Reconfiguration, Network Routing, SR-IOV, Virtualization |
Future Cloud Systems Design: Challenges and Research Directions
IEEE Access 6 (2018): 74120-74150.Status: Published
Future Cloud Systems Design: Challenges and Research Directions
Cloud computing has been recognized as the de facto computing utility standard for hosting and delivering services over the Internet. Cloud platforms are being rapidly adopted by business owners and end-users thanks to its many superiorities to traditional computing models such as cost saving, scalability, unlimited storage, anytime anywhere access, better security, and high fault-tolerance capability. However, despite the fact that clouds offer huge opportunities and services to the ICT industry, the landscape of cloud computing research is being expanded due to several reasons, such as emerging data-intensive applications, multicloud deployment models, and more strict non-functional requirements on cloud-based services. In this paper, we study a comprehensive taxonomy of main cloud computing research areas, discuss state-of-the-art approaches for each area and the associated sub-areas, and highlight the challenges and future directions per research area. The survey framework, presented in this paper, provides useful insights and outlook for the cloud computing research and development, allows broader understanding of the design challenges of cloud computing, and sheds light on the future of this fast-growing computing paradigm.
Afilliation | Communication Systems |
Project(s) | MELODIC: Multi-cloud Execution-ware for Large-scale Optimised Data-Intensive Computing |
Publication Type | Journal Article |
Year of Publication | 2018 |
Journal | IEEE Access |
Volume | 6 |
Pagination | 74120 - 74150 |
Publisher | IEEE |
DOI | 10.1109/ACCESS.2018.2883149 |
Talks, invited
Efficient and cost-effective data-intensive computing on multi-clouds: An introduction to the MELODIC project
In BioInformatics in Torun (BIT), Toruń, Poland, 2017.Status: Published
Efficient and cost-effective data-intensive computing on multi-clouds: An introduction to the MELODIC project
Data-intensive computing, often simply referred to as big data, is one of the major current trends in ICT. In areas as diverse as social media, business intelligence, information security, Internet-of-Things, and scientific research, a tremendous amount of data is created or collected at a speed surpassing what we can handle using traditional data management techniques. Life sciences are not different. With the vast amount of biological information available, such as Omics data, unprecedented opportunities for modern research and scientific breakthroughs arise, all depending on the efficient and cost-effective data analysis. Cloud computing, characterized by the paradigm of on-demand network access to computational resources and pay-as-you-go economic model, promises great potential of providing required computational resources for data analytics in Bioinformatics. However, challenges such as lack of data privacy and data-aware cloud federation keeps cloud computing from realizing the full potential for data-intensive applications. At the same time, non-standardized cloud interfaces make it complex to migrate big data applications between platforms thus preventing cloud users from achieving optimal cost-performance ratio for their applications by encouraging vendor lock-in.
In this talk, we provide an introduction to the MELODIC H2020 project and show how it can be of great value in Bioinformatics. The vision of MELODIC is to enable federated cloud computing for data-intensive applications, and provide the user with an easy-to-use unified cloud environment, hiding the complexity of a multi-cloud. The MELODIC platform enables big data applications to transparently take advantage of distinct characteristics of available private and public clouds by dynamically optimizing resource allocations considering data locality and user's performance and privacy needs. From the perspective of the user, the MELODIC framework appears as an infrastructure-agnostic middleware platform supporting development, deployment, and execution of data-intensive applications on distributed and heterogeneous multi-clouds. For the Bioinformatics community, this could mean utilizing the resources available for multiple cloud providers and private infrastructures in a secure, transparent, efficient, cost-effective, and reliable manner for their big data workloads.
Afilliation | Communication Systems |
Project(s) | MELODIC: Multi-cloud Execution-ware for Large-scale Optimised Data-Intensive Computing |
Publication Type | Talks, invited |
Year of Publication | 2017 |
Location of Talk | BioInformatics in Torun (BIT), Toruń, Poland |
PhD Thesis
Network Optimization for High Performance Cloud Computing
In University of Oslo. Vol. PhD. University of Oslo: University of Oslo, 2017.Status: Published
Network Optimization for High Performance Cloud Computing
Cloud Computing has seen a tremendous popularity in last several years. A scalable and efficient data center network is essential for a performance capable cloud computing infrastructure. This thesis provides practical solutions to enable an efficient, flexible, multi-tenant network architecture suitable for high-performance cloud computing, using InfiniBand (IB) as a demonstration technology. The work is motivated by the needs of the future data centers to provide efficient cloud solutions for increasing uptake of the cloud technology for both big data and traditional High-Performance Computing (HPC) applications.
Research contributions of this thesis lie within three main categories. First, we propose a set of improvements to the fat-tree routing algorithm to make it suitable for HPC workloads in the cloud. Fat-Tree is a popular network topology in HPC systems. Our proposed improvements to the fat-tree routing make it more efficient, provides performance isolation among tenants in multi-tenant systems, and enable routing of both physical end nodes and virtualized end nodes according to the policies set by the provider. Second, we design new network reconfiguration methods to significantly reduce the time it takes to reroute the IB network. Reduced network reconfiguration time means that the interconnection network in a HPC cloud can optimize itself quickly to adapt to changing tenant configurations, faults, running workloads, and current network conditions. Last, we demonstrate a self-adaptive network prototype for IB-based HPC clouds, fully equipped with autonomous monitoring and adaptation, and configurable through a high-level condition-action language for the service providers.
The research conducted in this thesis has potential impacts on both private cloud infrastructures, such as medium sized clusters used for enterprise HPC, and public clouds offering innovative HPC solutions to the customers at scale. The industrial application of the thesis is reflected by the eight patent applications resulted from this work.
Afilliation | Communication Systems |
Project(s) | ERAC: Efficient and Robust Architecture for the Big Data Cloud |
Publication Type | PhD Thesis |
Year of Publication | 2017 |
Degree awarding institution | University of Oslo |
Degree | PhD |
Date Published | 12/2017 |
Publisher | University of Oslo |
Place Published | University of Oslo |
URL | http://urn.nb.no/URN:NBN:no-62076 |
Talks, invited
About Management of Exascale Systems
In ExaComm 2016, Frankfurt, 2016.Status: Published
About Management of Exascale Systems
Afilliation | Communication Systems, Communication Systems |
Project(s) | ERAC: Efficient and Robust Architecture for the Big Data Cloud |
Publication Type | Talks, invited |
Year of Publication | 2016 |
Location of Talk | ExaComm 2016, Frankfurt |
Type of Talk | Invited talk |
Journal Article
Compact Network Reconfiguration in Fat-Trees
The Journal of Supercomputing 72, no. 12 (2016): 4438-4467.Status: Published
Compact Network Reconfiguration in Fat-Trees
Afilliation | Communication Systems |
Publication Type | Journal Article |
Year of Publication | 2016 |
Journal | The Journal of Supercomputing |
Volume | 72 |
Issue | 12 |
Pagination | 4438–4467 |
Publisher | Springer |
Efficient Network Isolation and Load Balancing in Multi-Tenant HPC Clusters
Journal of Future Generation Computer Systems (2016).Status: Published
Efficient Network Isolation and Load Balancing in Multi-Tenant HPC Clusters
Afilliation | Communication Systems |
Project(s) | No Simula project |
Publication Type | Journal Article |
Year of Publication | 2016 |
Journal | Journal of Future Generation Computer Systems |
Date Published | 04/2016 |
Publisher | Elsevier |
DOI | 10.1016/j.future.2016.04.003 |
PAVM: A Framework for Policy-Aware Virtual Machine Management
International Journal of Network Management 26, no. 6 (2016): 515-536.Status: Published
PAVM: A Framework for Policy-Aware Virtual Machine Management
The problem of efficient placement of Virtual Machines (VMs) in cloud computing infrastructure is well studied in the literature. VM placement decision involves selecting a physical machine in the data center to host a specific VM. This decision could play a pivotal role in yielding high efficiency for both the cloud and its users. Also, reallocation of virtual machines could be performed through migrations to achieve goals like higher server consolidation or power saving. VM placement and re-allocation decisions may consider affinities such as memory-sharing, CPU processing, disk-sharing and network bandwidth requirements between virtual machines defined in multiple dimensions. Considering the NP-hard complexity associated with computing an optimal solution for this VM placement decision problem, existing research employs heuristic-based techniques to compute an efficient solution. However, most of these approaches are restricted to only a single attribute at a time. That is, a given technique of using heuristics to compute VM placement considers only a single attribute, while completely ignoring the impact of other dimensions of placing VMs. While this approach may improve the efficiency with respect to the affinity attribute in consideration, it may yield degraded performance with respect to other affinities. In addition, the criteria for determining VMplacement efficiency may vary for different applications. Hence the overall goal of achieving VM placement efficiency becomes difficult and challenging. We are motivated by this challenging problem of efficient VM placement and propose PAVM (Policy-Aware Virtual Machine Management), a generic framework that can be used for efficient virtual machine management in a cloud computing platform based on the service provider defined policies to achieve the desired system wide goals. This involves efficient means to profile different virtual machine affinities and to use profiled information effectively by intelligent and efficient virtual machine migrations at runtime considering multiple attributes at a time. By conducting extensive evaluation through simulation and real experiments which involve VM affinities on the basis of network and memory, we confirmed that the PAVM architecture is capable of improving the efficiency of a cloud system. We elaborate the architecture of a PAVM system, describe its implementation and present details of our experiments.
Publication Type | Journal Article |
Year of Publication | 2016 |
Journal | International Journal of Network Management |
Volume | 26 |
Issue | 6 |
Pagination | 515-536 |
Date Published | 09/2016 |
Publisher | John Wiley & Sons |
DOI | 10.1002/nem.1948 |
Proceedings, refereed
Realizing a Self-Adaptive Network Architecture for HPC Clouds
In The International Conference for High Performance Computing, Networking, Storage and Analysis (SC '16) Doctoral Showcase, 2016.Status: Published
Realizing a Self-Adaptive Network Architecture for HPC Clouds
Clouds offer significant advantages over traditional cluster computing architectures including ease of deployment, rapid elasticity, and an economically attractive pay-as-you-go business model. However, the effectiveness of cloud computing for HPC systems still remains questionable. When clouds are deployed on lossless interconnection networks, challenges related to load balancing, low-overhead virtualization, and performance isolation hinder full potential utilization of the underlying interconnect. In this work, we attack these challenges and propose a novel holistic framework of a self-adaptive IB subnet for HPC clouds. Our solution consists of a feedback control loop that effectively incorporate optimizations based on the multidimensional objective function using current resource configuration and provider-defined policies. We build our system using a bottom-up approach, starting by prototyping solutions tackling individual research challenges associated, and later combining our novel solutions into a working self-adaptive cloud prototype. All our results are demonstrated using state-of-the art industry software to enable easy integration into running systems.
Afilliation | Communication Systems |
Project(s) | ERAC: Efficient and Robust Architecture for the Big Data Cloud |
Publication Type | Proceedings, refereed |
Year of Publication | 2016 |
Conference Name | The International Conference for High Performance Computing, Networking, Storage and Analysis (SC '16) Doctoral Showcase |
Patent
System and method for efficient network reconfiguration in fat-trees
2016.Status: Published
System and method for efficient network reconfiguration in fat-trees
Systems and methods are provided for supporting efficient reconfiguration of an interconnection network having a pre-existing routing comprising. An exemplary method can provide, a plurality of switches, the plurality switches comprising at least one leaf switch, wherein each of the one or more switches comprise a plurality of ports, and a plurality of end nodes, wherein the plurality of end nodes are interconnected via the one or more switches. The method can detect, by a subnet manager, a reconfiguration triggering event. The method can compute, by the subnet manager, a new routing for the interconnection network, wherein the computing by the subnet manager of the new routing for the interconnection network takes into consideration the pre-existing routing and selects the new routing for the interconnection network that is closest to the pre-existing routing. The method can reconfigure the interconnection network according to the new routing.
Afilliation | Communication Systems, Communication Systems |
Project(s) | ERAC: Efficient and Robust Architecture for the Big Data Cloud |
Publication Type | Patent |
Year of Publication | 2016 |
Application Number | US15/073,022 |
Date Published | 03/2016 |
Patent Type | Pending |
System and method for efficient network reconfiguration in fat-trees
2016.Status: Published
System and method for efficient network reconfiguration in fat-trees
Systems and methods are provided for supporting efficient reconfiguration of an interconnection network having a pre-existing routing. An exemplary method can provide a plurality of switches, a plurality of end nodes, and one or more subnet managers, including a master subnet manager. The method can calculate, via the master subnet manager, a first set of one or more leaf-switch to leaf-switch multipaths. The method can store this first set of one or more leaf-switch to leaf-switch multipaths at a metabase. The method can detect a reconfiguration triggering event, and call a new routing for the interconnection network. Finally, the method can reconfigure the network according to the new routing for the interconnection network.
Afilliation | Communication Systems |
Publication Type | Patent |
Year of Publication | 2016 |
Application Number | US14927085 |
Date Published | 05/2016 |
Patent Type | Pending |
System and method for supporting efficient load-balancing in a high performance computing (HPC) environment
2016.Status: Published
System and method for supporting efficient load-balancing in a high performance computing (HPC) environment
Methods and systems for supporting efficient load balancing among a plurality of switches and a plurality of end nodes arranged in a tree topology in a network environment. The methods and systems can sort the plurality of end nodes, wherein the plurality of end nodes are sorted in a decreasing order of a receive weight. The method and system may further route, in the decreasing order of receive weights, the plurality of end nodes, wherein the routing comprises selecting at least one down-going port and at least one up-going port. Further, the method and system can increase an accumulated downward weight on each selected down-going port by the receive weight of the routed end node, and increase an accumulated upward weight on each selected up-going port by the receive weight of the routed end node
Afilliation | Communication Systems, Communication Systems |
Project(s) | ERAC: Efficient and Robust Architecture for the Big Data Cloud |
Publication Type | Patent |
Year of Publication | 2016 |
Application Number | US14792070 |
Date Published | 01/16 |
Patent Type | Pending |
Notes | Date Filed: October 19, 2015, Published Online: Jan 14, 2016. |
System and method for supporting partition-aware routing in a multi-tenant cluster environment
2016.Status: Published
System and method for supporting partition-aware routing in a multi-tenant cluster environment
A system and method can support partition-aware routing in a multi-tenant cluster environment. An exemplary method can support one or more tenants within the multi-tenant cluster environment. The method can associate each of the one or more tenants with a partition of a plurality of partitions. The method can then associate each of the plurality of partitions with one or more nodes of a plurality of nodes, each of the plurality of nodes being associated with a leaf switch of a plurality of switches, the plurality of switches comprising a plurality of leaf switches and a plurality of root switches. Finally, the method can generate one or more linear forwarding tables, the one or more linear forwarding tables providing isolation between the plurality of partitions, wherein each of the plurality of nodes is associated with a partitioning order.
Afilliation | Communication Systems, Communication Systems |
Project(s) | ERAC: Efficient and Robust Architecture for the Big Data Cloud |
Publication Type | Patent |
Year of Publication | 2016 |
Application Number | US14927085 |
Date Published | 05/2016 |
Patent Type | Pending |
Notes | Date Filed: July 6, 2015, Published Online: May 5, 2016. |
Public outreach
A Self-adaptive network architecture for InfiniBand based HPC clouds
In Talk at 7th Cloud Control Workshop. Nässlingen, Sweden: 7th Cloud Control Workshop, 2015.Status: Published
A Self-adaptive network architecture for InfiniBand based HPC clouds
The research on network optimization in InfiniBand (IB) networks has been evolved in several directions, e.g. increasing network utilization, fault-tolerance, congestion control, and energy-aware systems. However, for efficient HPC clouds based on IB, the optimization problem becomes both complex and multi-dimensional, while individually proposed solutions often yield contradictory management decisions. We believe that a holistic closed-loop control system is required to effectively incorporate multidimensional objective function in future IB systems. Based on control theory, a self-adaptive model for the IB subnet system, may help acheiving better network utilization while effectively keeping user level SLAs in HPC clouds.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Public outreach |
Year of Publication | 2015 |
Secondary Title | Talk at 7th Cloud Control Workshop |
Publisher | 7th Cloud Control Workshop |
Place Published | Nässlingen, Sweden |
Type of Work | Discussion Session |
Proceedings, refereed
A weighted fat-tree routing algorithm for efficient load-balancing in InfiniBand enterprise clusters
In Proceedings of the 23rd Euromicro International Conference on Parallel, Distributed Network-based Processing (PDP 2015). Turku, Finland: IEEE, 2015.Status: Published
A weighted fat-tree routing algorithm for efficient load-balancing in InfiniBand enterprise clusters
InfiniBand (IB) has become a popular network interconnect for high-performance computing (HPC) systems. Many of the large IB-based HPC systems use some variant of the fat-tree topology to take advantage of the useful properties fat-trees offer. The fat-tree routing algorithm is one of the most efficient deterministic routing algorithms for fat-tree topologies. The algorithm ensures that the number of routes assigned to each link are balanced across the fabric. However, one problem with its load-balancing technique is that it assumes uniform traffic distribution in the network. When routes towards nodes that mainly consume large amount of data are assigned to share links in the fabric while alternative links are underutilized, sub-optimal network throughput is obtained. Also, as the fat-tree algorithm routes nodes according to the indexing order, the performance may differ for two systems cabled in the exact same way.
In this paper, we propose wFatTree, a novel fat-tree routing algorithm, which considers node traffic characteristics to balance load across the network links more evenly, and with predictable network performance. Our experiments and simulations show an improvement of up to 60% in total network throughput on large fat-tree installations when using wFatTree routing. Furthermore, wFatTree can also be used to prioritize traffic flowing towards the critical nodes in the network.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2015 |
Conference Name | Proceedings of the 23rd Euromicro International Conference on Parallel, Distributed Network-based Processing (PDP 2015) |
Pagination | 35-42 |
Date Published | 03/2015 |
Publisher | IEEE |
Place Published | Turku, Finland |
ISSN Number | 1066-6192 |
Accession Number | 15090056 |
Keywords | fat-tree networks, InfiniBand, Load-balancing, Routing algorithms |
DOI | 10.1109/PDP.2015.111 |
Partition-aware routing to improve network isolation in InfiniBand based multi-tenant clusters
In 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid). Shenzhen, China: ACM/IEEE, 2015.Status: Published
Partition-aware routing to improve network isolation in InfiniBand based multi-tenant clusters
InfiniBand (IB) is a widely used network interconnect for modern high-performance computing systems. In large IB fabrics, network isolation is provided through partitioning. However, routing is oblivious to the partitions in the network. Hence, physical links share flows from different partitions. This sharing of the intermediate links creates interference, which is particularly critical to avoid in multi-tenant environments, like cloud computing. In such systems, each tenant needs predictable network performance, unaffected by the workload of the other tenants. In addition, using the current routing schemes, despite that the links connecting nodes outside partitions are never used, they are routed the same way as the other functional links. This may result in degraded load-balancing.
In this paper, we present an implementation of a partition-aware fat-tree routing algorithm, pFTree. The pFTree utilizes a multifold mechanism to provide performance isolation among partitions belonging to the different tenant groups. Given the available network resources, pFTree starts isolating partitions at the physical link level, and then it moves on to utilize virtual lanes when needed. Our experiments and simulations show that pFTree is able to significantly reduce the affect of inter-partition interference effectively without any additional functional overhead. Furthermore, pFTree also provides improved load-balancing over the state-of-the-art fat-tree routing algorithm.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2015 |
Conference Name | 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid) |
Pagination | 189-198 |
Date Published | 07/2015 |
Publisher | ACM/IEEE |
Place Published | Shenzhen, China |
ISBN Number | 978-1-4799-8006-2 |
DOI | 10.1109/CCGrid.2015.96 |
SlimUpdate: Minimal Routing Update for Performance-based Reconfigurations in Fat-Trees
In 1st IEEE International Workshop on High-Performance Interconnection Networks Towards the Exascale and Big-Data Era (HiPINEB 2015). IEEE Computer Society, 2015.Status: Published
SlimUpdate: Minimal Routing Update for Performance-based Reconfigurations in Fat-Trees
As the size of high-performance computing systems grows, the number of events requiring a network reconfiguration, as well as the complexity of each reconfiguration, is likely to increase. In large systems, the probability of component failure is high. At the same time, with more network components, ensuring high utilization of network resources becomes challenging. Reconfiguration in interconnection networks, like InfiniBand (IB), typically involves computation and distribution of a new set of routes in order to maintain connectivity and performance. In general, current routing algorithms do not consider the existing routes in a network when calculating new ones. Such configuration-oblivious routing might result in substantial modifications to the existing paths, and the reconfiguration becomes costly as it potentially involves a large number of source-destination pairs.
In this paper, we propose a novel routing algorithm for IB based fat-tree topologies, SlimUpdate. SlimUpdate employs techniques to preserve existing forwarding entries in switches to ensure a minimal routing update, without any performance penalty, and with minimal computational overhead. We present an implementation of SlimUpdate in OpenSM, and compare it with the current de facto fat-tree routing algorithm. Our experiments and simulations show a decrease of up to 80% in the number of total path modifications when using SlimUpdate routing, while achieving similar or even better performance than the fat-tree routing in most reconfiguration scenarios.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2015 |
Conference Name | 1st IEEE International Workshop on High-Performance Interconnection Networks Towards the Exascale and Big-Data Era (HiPINEB 2015) |
Pagination | 849-856 |
Date Published | 10/2015 |
Publisher | IEEE Computer Society |
ISBN Number | 978-1-4673-6598-7 |
Accession Number | 15570970 |
DOI | 10.1109/CLUSTER.2015.142 |
Poster
ERAC - Efficient and Robust Architecture for Big Data Cloud
2014.Status: Published
ERAC - Efficient and Robust Architecture for Big Data Cloud
The primary objective of the ERAC project is to provide the knowledge and solutions that enable an elastic, scalable, robust, flexible, secure, and energy efficient cloud architecture that matches both the expectations of the Social Networks (SN) and the Internet of Things (IoT) in terms of services, functionality, and the efficiency requirements of the cloud providers. In the project we shall research, develop, build, and demonstrate cloud technologies for the Future Internet.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Poster |
Year of Publication | 2014 |
Date Published | June |
Keywords | Internal Seminar, University of Oslo |
ERAC - Efficient and Robust Architecture for Big Data Clouds
2014.Status: Published
ERAC - Efficient and Robust Architecture for Big Data Clouds
The primary objective of the ERAC project is to provide knowledge and solutions that enable an elastic, scalable, robust, flexible, secure, and energy efficient cloud architecture that matches both the expectations of the Social Networks (SN) and the Internet of Things (IoT) in terms of services, functionality, and efficiency requirements of the cloud providers. In the project, we shall research, develop, and prototype cloud technologies for the Future Internet.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Poster |
Year of Publication | 2014 |
Date Published | July |
ISBN Number | 978-88-905806-2-8 |