Publications
Proceedings, refereed
Adaptive Routing in InfiniBand Hardware
In The 22nd IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing. IEEE, 2022.Status: Accepted
Adaptive Routing in InfiniBand Hardware
Interconnection networks are the communication backbone of modern high-performance computing systems and an optimised interconnection network is crucial for the performance and utilisation of the system as a whole. One element of the interconnection network is the routing algorithm, which directly influences how we are able to utilise the physical network topology. InfiniBand is one of the most common network architectures used in high-performance computing and traditionally it only supported static routing. For multi-path networks such as Fat-trees, static routing is inefficient because it cannot balance traffic in real-time nor utilise multiple paths efficiently under adversarial traffic. This again potentially leads to unnecessary contention and an underutilised network, which has led to numerous proposals on how to avoid this by using adaptive routing. Adaptive routing has recently been introduced in InfiniBand and in this paper we evaluate to what extent the expected benefits of adaptive routing is true for InfiniBand. Through a set of experiments on HDR InfiniBand equipment we describe the basic behaviour of adaptive routing in InfiniBand, its benefits in Fat tree topologies and the unfortunate side effects related to unfairness that adaptive routing in general might introduce, including such phenomena as the reverse parking lot
problem and congestion spreading.
Afilliation | Communication Systems |
Project(s) | Simula Metropolitan Center for Digital Engineering, Department of High Performance Computing |
Publication Type | Proceedings, refereed |
Year of Publication | 2022 |
Conference Name | The 22nd IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing |
Publisher | IEEE |
Book Chapter
Smittestopp Backend
In Smittestopp − A Case Study on Digital Contact Tracing, 29-62. Vol. 11. Cham: Springer International Publishing, 2022.Status: Published
Smittestopp Backend
An efficient backend solution is of great importance for any large-scale system, and Smittestopp is no exception. The Smittestopp backend comprises various components for user and device registration, mobile app data ingestion, database and cloud operations, and web interface support. This chapter describes our journey from a vague idea to a deployed system. We provide an overview of the system internals and design iterations and discuss the challenges that we faced during the development process, along with the lessons learned. The Smittestopp backend handled around 1.5 million registered devices and provided various insights and analyses before being discontinued a few months after its launch.
Afilliation | Machine Learning |
Project(s) | Department of Holistic Systems |
Publication Type | Book Chapter |
Year of Publication | 2022 |
Book Title | Smittestopp − A Case Study on Digital Contact Tracing |
Volume | 11 |
Pagination | 29 - 62 |
Date Published | 06/2022 |
Publisher | Springer International Publishing |
Place Published | Cham |
ISBN Number | 978-3-031-05465-5 |
ISBN | 2512-1677 |
URL | https://link.springer.com/content/pdf/10.1007/978-3-031-05466-2.pdf |
DOI | 10.1007/978-3-031-05466-2_3 |
Journal Article
Nationwide rollout reveals efficacy of epidemic control through digital contact tracing
Nature Communications 12 (2021).Status: Published
Nationwide rollout reveals efficacy of epidemic control through digital contact tracing
Afilliation | Communication Systems, Scientific Computing, Machine Learning |
Project(s) | The Center for Resilient Networks and Applications, Department of Data Science and Knowledge Discovery , Department of Computational Physiology |
Publication Type | Journal Article |
Year of Publication | 2021 |
Journal | Nature Communications |
Volume | 12 |
Number | 5918 |
Publisher | Springer Nature |
DOI | 10.1038/s41467-021-26144-8 |
Patent
System and method of computing ethernet routing paths
In US Patent. H04L45/02 ed, 2020.Status: Published
System and method of computing ethernet routing paths
Afilliation | Communication Systems |
Project(s) | Fabriscale |
Publication Type | Patent |
Year of Publication | 2020 |
Published Source | US Patent |
International Patent Classification | H04L45/02 |
International Patent Number | 10855581 |
Application Number | 16/138,366 |
Date Published | 12/2020 |
URL | https://patents.google.com/patent/US20190149461A1 |
Patent
Method of computing balanced routing paths in fat-trees
In Us Patent. H04L45/14 ed, 2019.Status: Published
Method of computing balanced routing paths in fat-trees
A device and method for providing balanced routing paths in a computational grid including determining a type of topology of the computational grid having a plurality of levels, wherein each level includes a plurality of switches, determining whether the type of topology of the computational grid is a fat-tree, determining whether the fat-tree is odd, determining whether the fat-tree is a regular fat-tree, computing a first set of routing paths for the computational grid based on the determining of whether the fat-tree is odd and is a regular fat-tree, computing a second set of routing paths for the computational grid using a topology agnostic routing technique, and configuring forwarding tables in said switches with the first set of computed routing paths when the topology is determined to be a fat-tree and with the second set of computed routing paths when the topology is determined to not be a fat-tree.
Afilliation | Communication Systems |
Project(s) | Fabriscale |
Publication Type | Patent |
Year of Publication | 2019 |
Published Source | Us Patent |
International Patent Classification | H04L45/14 |
International Patent Number | US10425324B2 |
Application Number | 15/679, 974 |
Date Published | 09/2019 |
URL | https://patents.google.com/patent/US10425324B2/en?oq=US10425324B2 |
Journal Article
Early experiences with live migration of SR-IOV enabled InfiniBand
Journal of Parallel and Distributed Computing 78, no. C (2015): 39-52.Status: Published
Early experiences with live migration of SR-IOV enabled InfiniBand
Virtualization is the key to efficient resource utilization and elastic resource allocation in cloud computing. It enables consolidation, the on-demand provisioning of resources, and elasticity through live migration. Live migration makes it possible to optimize resource usage by moving virtual machines (VMs) between physical servers in an application transparent manner. It does, however, require a flexible, high-performance, scalable virtualized I/O architecture to reach its full potential. This is challenging to achieve with high-speed networks such as InfiniBand and remote direct memory access enhanced Ethernet, because these devices usually maintain their connection state in the network device hardware. Fortunately, the single root IO virtualization (SR-IOV) specification addresses the performance and scalability issues. With SR-IOV, each VM has direct access to a hardware assisted virtual device without the overhead introduced by emulation or para-virtualization. However, SR-IOV does not address the migration of the network device state. In this paper we present and evaluate the first available prototype implementation of live migration over SR-IOV enabled InfiniBand devices.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Journal Article |
Year of Publication | 2015 |
Journal | Journal of Parallel and Distributed Computing |
Volume | 78 |
Issue | C |
Pagination | 39-52 |
Date Published | 04/2015 |
Publisher | Elsevier |
Keywords | Architecture, IO virtualization, SR-IOV, VM migration |
DOI | 10.1016/j.jpdc.2015.01.004 |
Journal Article
A New Proposal to Deal With Congestion in InfiniBand-Based Fat-Trees
Journal of Parallel and Distributed Computing 74 (2014): 1802-1819.Status: Published
A New Proposal to Deal With Congestion in InfiniBand-Based Fat-Trees
The overall performance of High-Performance Computing applications may depend largely on the performance achieved by the network interconnecting the end-nodes, thus high-speed interconnect technologies like InfiniBand are used to provide high throughput and low latency. Nevertheless, network performance may be degraded due to congestion, thus using techniques to deal with the problems derived from congestion has become practically mandatory. In this paper we propose a straightforward congestion-management method suitable for fat-tree topologies built from InfiniBand components. Our proposal is based on a traffic-flow-to-service-level mapping that prevents, as much as possible with the resources available in current InfiniBand components (basically Virtual Lanes), the negative impact of the two most common problems derived from congestion: head-of-line blocking and buffer-hogging. We also provide a mathematical approach to analyze the efficiency of our proposal and several ones, by means of a set of analytical metrics. In certain traffic scenarios, we observe up to a 68% of the ideal performance gain that could be achieved in HoL-blocking and buffer-hogging prevention.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Journal Article |
Year of Publication | 2014 |
Journal | Journal of Parallel and Distributed Computing |
Volume | 74 |
Number | 1 |
Pagination | 1802-1819 |
Date Published | January |
Publisher | Elsevier |
Keywords | Congestion management, Fat-trees, High-performance computing, InfiniBand, Interconnection networks |
DOI | 10.1016/j.jpdc.2013.09 |
Proceedings, refereed
Multi-Homed Fat-Tree Routing With InfiniBand
In 22nd Euromicro International Conference on Parallel, Distributed and Network-based Processing. IEEE Computer Society, 2014.Status: Published
Multi-Homed Fat-Tree Routing With InfiniBand
For clusters where the topology consists of a fat-tree or more than one fat-tree combined into one subnet, there are several properties that the routing algorithms should support, beyond what exists today. One of the missing properties is that current fat-tree routing algorithm does not guarantee that each port on a multi-homed node is routed through redundant spines, even if these ports are connected to redundant leaves. As a consequence, in case of a spine failure, there is a small window where the node is unreachable until the subnet manager has rerouted to another spine. In this paper, we discuss the need for independent routes for multi-homed nodes in fat-trees by providing real-life examples when a single point of failure leads to complete outage of a multi-port node. We present and implement the methods that may be used to alleviate this problem and perform simulations that demonstrate improvements in performance, scalability, availability and predictability of InfiniBand fat-tree topologies. We show that our methods not only increase the performance by up to 52.6%, but also, and more importantly, that there is no downtime associated with spine switch failure.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2014 |
Conference Name | 22nd Euromicro International Conference on Parallel, Distributed and Network-based Processing |
Pagination | 122-129 |
Date Published | 02/2014 |
Publisher | IEEE Computer Society |
ISSN Number | 1066-6192 |
DOI | 10.1109/PDP.2014.22 |
Patent
System and method for signaling dynamic reconfiguration events in a middleware machine environment
G06F11/1423 ed, 2014.Status: Published
System and method for signaling dynamic reconfiguration events in a middleware machine environment
Rerouting around faulty components and migration of jobs both require reconfiguration of data structures in the Queue Pairs residing in the hosts on an InfiniBand cluster. In this patent we describe an implementation of dynamic reconfiguration of such host side data-structures. Our implementation preserves the Queue Pairs, and lets the application run without being interrupted. With this implementation, we demonstrate a complete solution to fault tolerance in an InfiniBand network, where dynamic network reconfiguration to a topology-agnostic routing function is used to avoid malfunctioning components. This solution is in principle able to let applications run uninterruptedly on the cluster, as long as the topology is physically connected. Through measurements on our test-cluster we show that the increased cost of our method in setup latency is negligible, and that there is only a minor reduction in throughput during reconfiguration.
Afilliation | Communication Systems |
Project(s) | No Simula project |
Publication Type | Patent |
Year of Publication | 2014 |
International Patent Classification | G06F11/1423 |
International Patent Number | US20130124910A1 |
Application Number | 13/649,689 |
Date Published | 11/2014 |
URL | https://patents.google.com/patent/US8880932B2 |
Proceedings, refereed
Making the Network Scalable: Inter-Subnet Routing in InfiniBand
In Proceedings from the 19th International Euro-Par Conference on Parallel Processing. Vol. 8097. Lecture Notes in Computer Science 8097. Springer Berlin Heidelberg, 2013.Status: Published
Making the Network Scalable: Inter-Subnet Routing in InfiniBand
As InfiniBand clusters grow in size and complexity, the need arises to segment the network into manageable sections. Up until now, InfiniBand routers have not been used extensively and little research has been done to accommodate them. However, the limits imposed on local addressing space, inability to logically segment fabrics, long reconfiguration times for large fabrics in case of faults, and, finally, performance issues when interconnecting large clusters, have rekindled the industry's interest into IB-IB routers. In this paper, we examine the routing problems that exist in the current implementation of OpenSM and we introduce two new routing algorithms for inter-subnet IB routing. We evaluate the performance of our routing algorithms against the current solution and we show an improvement of up to 100 times that of OpenSM.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2013 |
Conference Name | Proceedings from the 19th International Euro-Par Conference on Parallel Processing |
Volume | 8097 |
Pagination | 685-698 |
Date Published | August |
Publisher | Springer Berlin Heidelberg |
ISBN Number | 978-3-642-40046-9 |
Keywords | Conference |
DOI | 10.1007/978-3-642-40047-6\_69 |
Talks, contributed
Prototyping Live Migration With SR-IOV Supported InfiniBand HCAs
In HPC Advisory Council Spain Conference, 2013.Status: Published
Prototyping Live Migration With SR-IOV Supported InfiniBand HCAs
Live migration is challenging to achieve with high-speed networks because these devices usually maintain their connection state in the network device hardware. In this work we i) describe the challenges with live migration over SR-IOV enabled InfiniBand devices, ii) present and evaluate the first available prototype implementation of live migration over SR-IOV enabled InfiniBand devices.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Talks, contributed |
Year of Publication | 2013 |
Location of Talk | HPC Advisory Council Spain Conference |
Keywords | Conference |
Proceedings, refereed
A Scalable Signalling Mechanism for VM Migration With SR-IOV Over InfiniBand
In 18th IEEE International Conference on Parallel and Distributed Systems (ICPADS). IEEE Computer Society, 2012.Status: Published
A Scalable Signalling Mechanism for VM Migration With SR-IOV Over InfiniBand
Single Root I/O Virtualization (SR-IOV) is a promising I/O virtualization approach for achieving high performance in the virtualization over InfiniBand (IB) network. One challenge is related to the hardware address assignment for each virtual IB device. There are two schemes for the hardware address assignment; static assignment and dynamic assignment. Static assignment always preserves the hardware address of a virtual IB device that is attached to a VM, but the dynamic assignment does not. A drawback, however, using static assignment is that its communication will be disconnected after VM migration. In this paper, we point out the problem related to SRIOV over IB that breaks the network connections after VM migration when the static assignment is deployed. Then, we propose a signalling mechanism that can maintain the network connectivity after VM migration. The performance evaluation using an experimental test bed shows that the proposed signalling mechanism does not increase the service downtime during hot migration. We also optimize the signalling method, where the same event can only be forwarded to a physical server once regardless of the hosted VMs, to reduce the management message overhead from O(n {\_\ast} m) to O(n).
Afilliation | Communication Systems, Communication Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2012 |
Conference Name | 18th IEEE International Conference on Parallel and Distributed Systems (ICPADS) |
Pagination | 384-391 |
Publisher | IEEE Computer Society |
ISBN Number | 978-0-7695-4903-3 |
Keywords | Conference |
Discovery and Routing of Degraded Fat-Trees
In 2012 13th International Conference on Parallel and Distributed Computing, Applications and Technologies. Los Alamitos: IEEE Computer Society, 2012.Status: Published
Discovery and Routing of Degraded Fat-Trees
The fat-tree topology has become a popular choice for InfiniBand enterprise systems due to its deadlock freedom, fault-tolerance and full bisection bandwidth. In the HPC domain, InfiniBand fabric is used in almost 42% of the systems on the latest Top 500 list, and many of those systems are based on the fat-tree topology. Despite the popularity of the fat-tree topology, little research has been done to compare the behavior of InfiniBand routing algorithms on degraded fat-tree topologies. In this paper, we identify the weaknesses of the current fat-tree routing and propose enhancements that liberalize the restrictions imposed on the routed fabric. Furthermore, we present a thorough analysis of non-proprietary routing algorithms that are implemented in the InfiniBand Open Subnet Manager. Our results show that even though the performance of a fat-tree routed network deteriorates predictably with the number of failed links, fat-tree routing algorithm is still the best choice for severely degraded fat-tree fabrics.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2012 |
Conference Name | 2012 13th International Conference on Parallel and Distributed Computing, Applications and Technologies |
Pagination | 689-694 |
Date Published | December |
Publisher | IEEE Computer Society |
Place Published | Los Alamitos |
Keywords | Conference |
Exploring the Scope of the InfiniBand Congestion Control Mechanism
In 2012 IEEE International Symposium on Parallel & Distributed Processing (IPDPS). IEEE Computer Society, 2012.Status: Published
Exploring the Scope of the InfiniBand Congestion Control Mechanism
In a lossless interconnection network, network congestion needs to be detected and resolved to ensure high performance and good utilization of network resources at high network load. If no countermeasure is taken, congestion at a node in the network will stimulate the growth of a congestion tree that not only affects contributors to congestion, but also other traffic flows in the network. Left untouched, the congestion tree will block traffic flows, lead to underutilization of network resources and result in a severe drop in network performance. The InfiniBand standard specifies a congestion control (CC) mechanism to detect and resolve congestion before a congestion tree is able to grow and, by that, hamper the network performance. The InfiniBand CC mechanism includes a rich set of parameters that can be tuned in order to achieve effective CC. Even though it has been shown that the CC mechanism, properly tuned, is able to improve both throughput and fairness in an interconnection network, it has been questioned whether the mechanism is fast enough to keep up with dynamic network traffic, and if a given set of parameter values for a topology is robust when it comes to different traffic patterns, or if the parameters need to be tuned depending on the applications in use. In this paper we address both these questions. Using the three-stage fat-tree topology from the Sun Datacenter InfiniBand Switch 648 as a basis, and a simulator tuned against CC capable InfiniBand hardware, we conduct a systematic study of the efficiency of the InfiniBand CC mechanism as the network traffic becomes increasingly more dynamic. Our studies show that the InfiniBand CC, even when using a single set of parameter values, performs very well as the traffic patterns becomes increasingly more dynamic, outperforming a network without CC in all cases. Our results show throughput increases varying from a few percent, to a seventeen-fold increase.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2012 |
Conference Name | 2012 IEEE International Symposium on Parallel & Distributed Processing (IPDPS) |
Pagination | 1131-1143 |
Publisher | IEEE Computer Society |
DOI | 10.1109/IPDPS.2012.104 |
Talks, contributed
Fat-Trees and Dragonflies - a Perspective on Topologies
In Contributed talk at the HPC Advisory Council Switzerland Workshop, Lugano, Switzerland., 2012.Status: Published
Fat-Trees and Dragonflies - a Perspective on Topologies
One of the foundations of any HPC cluster is the network topology. The choice of topology is dictated by properties such as cost, network technology, and target applications. For InfiniBand (IB) the Fat-tree is the dominating topology, but recently we have also seen IB clusters based on a 3d Torus. From time to time new topologies are invented and in this talk we will have a closer look at the recently proposed Dragon-fly topology and compare it with the well established Fat-tree topology.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Talks, contributed |
Year of Publication | 2012 |
Location of Talk | Contributed talk at the HPC Advisory Council Switzerland Workshop, Lugano, Switzerland. |
Keywords | Workshop |
Prototyping Live Migration With SR-IOV Supported InfiniBand
In Contributed talk at the 2012 OpenFabrics International Workshop, Monterey, USA, 2012.Status: Published
Prototyping Live Migration With SR-IOV Supported InfiniBand
Afilliation | Communication Systems, Communication Systems |
Publication Type | Talks, contributed |
Year of Publication | 2012 |
Location of Talk | Contributed talk at the 2012 OpenFabrics International Workshop, Monterey, USA |
Keywords | Workshop |
Journal Article
SFtree: a Fully Connected and Deadlock Free Switch-to-Switch Routing Algorithm for Fat-Trees
ACM Transactions on Architecture and Code Optimization 8 (2012).Status: Published
SFtree: a Fully Connected and Deadlock Free Switch-to-Switch Routing Algorithm for Fat-Trees
Existing fat-tree routing algorithms fully exploit the path diversity of a fat-tree topology in the context of compute node traffic, but they lack support for deadlock free and fully connected switch-to-switch communication. Such support is crucial for efficient system management, for example in InfiniBand (IB) systems. With the general increase in system management capabilities found in modern InfiniBand switches, the lack of deadlock free switch-to-switch communication is a problem for fat-tree based IB installations because management traffic might cause routing deadlocks that bring the whole system down. This lack of deadlock free communication affects all system management and diagnostic tools using LID routing. In this paper, we propose the sFtree routing algorithm that guarantees deadlock free and fully connected switch-to-switch communication in fat-trees while maintaining the properties of the current fat-tree algorithm. We prove that the algorithm is deadlock free and we implement it in OpenSM for evaluation. We evaluate the performance of the sFtree algorithm experimentally on a small cluster and we do a large-scale evaluation through simulations. The results confirm that the sFtree routing algorithm is deadlock free and show that the impact of switch-to-switch management traffic on the end-node traffic is negligible.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Journal Article |
Year of Publication | 2012 |
Journal | ACM Transactions on Architecture and Code Optimization |
Volume | 8 |
Number | 4 |
Date Published | January |
Publisher | ACM |
DOI | 10.1145/2086696.208673 |
Proceedings, refereed
A Scalable Method for Signalling Dynamic Reconfiguration Events With OpenSM
In 11th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing (CCGrid 2011). IEEE Computer Society Press, 2011.Status: Published
A Scalable Method for Signalling Dynamic Reconfiguration Events With OpenSM
Rerouting around faulty components, on-the-fly policy changes, and migration of jobs all require reconfiguration of data structures in the Queue Pairs residing in the hosts on an InfiniBand cluster. In addition to a proper implementation at the host, the subnet manager needs to implement a scalable method for signaling reconfiguration events to the hosts. In this paper we propose and evaluate three different implementations for signalling dynamic reconfiguration events with OpenSM. Through our evaluation we demonstrate a scalable solution for signalling host-side reconfiguration events in an InfiniBand network based on an example where dynamic network reconfiguration combined with a topology-agnostic routing function is used to avoid malfunctioning components. Through measurements on our test-cluster and an analytical study we show that our best proposal reduces reconfiguration latency by more than 90% and in certain situations eliminates it completely. Furthermore, the processing overhead in the subnet manager is shown to be minimal.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2011 |
Conference Name | 11th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing (CCGrid 2011) |
Pagination | 332-341 |
Publisher | IEEE Computer Society Press |
ISBN Number | 978-1-4577-0129-0 |
DFtree - a Fat-Tree Routing Algorithm Using Dynamic Allocation of Virtual Lanes to Alleviate Congestion in InfiniBand Networks
In The Network-Aware Data Management Workshop to be held in conjunction with the IEEE/ACM International Conference for High Performance Computing, Networking, Storage and Analysis (SC'11). ACM, 2011.Status: Published
DFtree - a Fat-Tree Routing Algorithm Using Dynamic Allocation of Virtual Lanes to Alleviate Congestion in InfiniBand Networks
End-point hotspots can cause major slowdowns in interconnection networks due to head-of-line blocking and congestion. Therefore, avoiding congestion is important to ensure high performance for the network traffic. It is especially important in situations where permanent congestion, which results in permanent slowdown, can occur. Permanent congestion occurs when traffic has been moved away from a failed link, when multiple jobs run on the same system, and compete for network resources, or when a system is not balanced for the application that runs on it. In this paper we suggest a mechanism for dynamic allocation of virtual lanes and live optimisation of the distribution of flows between the allocated virtual lanes. The purpose is to alleviate the negative effect of permanent congestion by separating network flows into slow lane and fast lane traffic. Flows destined for an end-point hot-spot is placed in the slow lane and all other flows are placed in the fast lane. Consequently, the flows in the fast lane are unaffected by the head-of-line blocking created by the hot-spot traffic. We demonstrate the feasbility of this approach using a modified version of OFED and OpenSM with fat-tree routing on a small InfiniBand cluster. Our experiments show an increase in throughput ranging from 150% to 468% compared to the conventional fat-tree algorithm in OFED.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2011 |
Conference Name | The Network-Aware Data Management Workshop to be held in conjunction with the IEEE/ACM International Conference for High Performance Computing, Networking, Storage and Analysis (SC'11) |
Pagination | 1-10 |
Date Published | November |
Publisher | ACM |
ISBN Number | 978-1-4503-1132-8 |
InfiniBand Congestion Control, Modelling and Validation
In 4th International ICST Conference on Simulation Tools and Techniques (SIMUTools2011, OMNeT++ 2011 Workshop). SIMUTools '11. Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering, 2011.Status: Published
InfiniBand Congestion Control, Modelling and Validation
In a lossless interconnection network congestion may results in performance degradation if no countermeasure is taken. To relieve the consequences of congestion, and by that to achieve good utilization of networks resources even at high network load, congestion control (CC) has been added to the InfiniBand specification. The behavior of the InfiniBand CC is, however, governed by a set of CC parameters. Exactly how to set these parameters to ensure an all over efficient network is still not well understood. It is time consuming, costly and hard to explore the CC parameter space in a large scale cluster. Therefore, a simulation platform is needed. In this paper we present our CC capable IB model implemented in the OMNeT++ environment. We explain the basics of our model, and validate it against CC capable hardware to show its high accuracy.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2011 |
Conference Name | 4th International ICST Conference on Simulation Tools and Techniques (SIMUTools2011, OMNeT++ 2011 Workshop) |
Pagination | 390-397 |
Date Published | March |
Publisher | Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering |
ISBN Number | 978-1-936968-00-8 |
On the Relation Between Congestion Control, Switch Arbitration and Fairness
In 11th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing (CCGrid 2011). IEEE, 2011.Status: Published
On the Relation Between Congestion Control, Switch Arbitration and Fairness
In lossless interconnection networks such as InfiniBand, congestion control (CC) can be an effective mechanism to achieve high performance and good utilization of network resources. The InfiniBand standard describes CC functionality for detecting and resolving congestion, but the design decisions on how to implement this functionallity is left to the hardware designer. One must be cautious when making these design decisions not to introduce fairness problems, as our study shows. In this paper we study the relationship between congestion control, switch arbitration, and fairness. Specifically, we look at fairness among different traffic flows arriving at a hot spot switch on different input ports, as CC is turned on. In addition we study the fairness among traffic flows at a switch where some flows are exclusive users of their input ports while other flows are sharing an input port (the parking lot problem). Our results show that the implementation of congestion control in a switch is vulnerable to unfairness if care is not taken. In detail, we found that a threshold hysteresis of more than one MTU is needed to resolve arbitration unfairness. Furthermore, to fully solve the parking lot problem, proper configuration of the CC parameters are required.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2011 |
Conference Name | 11th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing (CCGrid 2011) |
Pagination | 342-351 |
Date Published | May |
Publisher | IEEE |
ISBN Number | 978-1-4577-0129-0 |
DOI | 10.1109/CCGrid.2011.67 |
VFtree - a Fat-Tree Routing Algorithm Using Virtual Lanes to Alleviate Congestion
In Proceedings of the 25th IEEE International Parallel & Distributed Processing Symposium. IEEE Computer Society Press, 2011.Status: Published
VFtree - a Fat-Tree Routing Algorithm Using Virtual Lanes to Alleviate Congestion
It is a well known fact that multiple virtual lanes can improve performance in interconnection networks, but this knowledge has had little impact on real clusters. Currently, a large number of clusters using InfiniBand is based on fat-tree topologies that can be routed deadlock-free using only one virtual lane. Consequently, all the remaining virtual lanes are left unused. In this paper we suggest an enhancement to the fat-tree algorithm that utilizes virtual lanes to improve performance when hot-spots are present. Even though the bisection bandwidth in a fat-tree is constant, hot-spots are still possible and they will degrade performance for flows not contributing to them due to head-of-line blocking. Such a situation may be alleviated through adaptive routing or congestion control, however, these methods are not yet readily available in InfiniBand technology. To remedy this problem, we have implemented an enhanced fat-tree algorithm in OpenSM that distributes traffic across all available virtual lanes without any configuration needed. We evaluated the performance of the algorithm on a small cluster and done a large-scale evaluation through simulations. In a congested environment, results show that we are able to achieve throughput increases up to 38% on a small cluster and from 221% to 757% depending on the hot-spot scenario for a 648-port simulated cluster.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2011 |
Conference Name | Proceedings of the 25th IEEE International Parallel & Distributed Processing Symposium |
Pagination | 197-208 |
Publisher | IEEE Computer Society Press |
ISBN Number | 978-1-61284-372-8 |
Talks, contributed
InfiniBand Congestion Control
In Contributed talk at the 2011 OpenFabrics International Workshop, Monterey, USA, 2011.Status: Published
InfiniBand Congestion Control
In InfiniBand networks congestion control (CC) can be an effective mechanism to achieve high performance and high utilisation of network resources. Without CC, congestion in one node may severely degrade overall performance. In this talk we introduce the problem of congestion and how it can be avoided with InfiniBand congestion control.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Talks, contributed |
Year of Publication | 2011 |
Location of Talk | Contributed talk at the 2011 OpenFabrics International Workshop, Monterey, USA |
VFtree - a Fat-Tree Routing Algorithm Using Virtual Lanes to Alleviate Congestion
In Invited talk at the HPC Advisory Council Switzerland Workshop 2011, Lugano, Switzerland., 2011.Status: Published
VFtree - a Fat-Tree Routing Algorithm Using Virtual Lanes to Alleviate Congestion
A large number of clusters using InfiniBand is based on fat-tree topologies that can be routed deadlock-free using only one virtual lane. Consequently, all the remaining virtual lanes are left unused. In this talk we present an enhancement to the fat-tree algorithm that utilizes virtual lanes to improve performance when hot-spots are present. In a congested environment, results show that we are able to achieve throughput increases up to 38% on a small cluster and from 221% to 757% depending on the hot-spot scenario for a 648-port simulated cluster.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Talks, contributed |
Year of Publication | 2011 |
Location of Talk | Invited talk at the HPC Advisory Council Switzerland Workshop 2011, Lugano, Switzerland. |
Proceedings, refereed
Achieving Predictable High Performance in Imbalanced Fat Trees
In Proceedings of the 16th IEEE International Conference on Parallel and Distributed Systems. IEEE Computer Society, 2010.Status: Published
Achieving Predictable High Performance in Imbalanced Fat Trees
Abstract-The fat-tree topology has become a popular choice for InfiniBand fabrics due to its inherent deadlock freedom, fault-tolerance and full bisection bandwidth. InfiniBand is used by more than 40% of the systems on the latest Top 500 list, and many of these systems are based on a fat-tree topology. However, the current InfiniBand fat-tree routing algorithm suffers from flaws that reduce its scalability and flexibility. Counter-intuitively, the achievable throughput per node deteriorates both when the number of nodes in a tree decreases or when the node distribution among leaves is nonuniform. In this paper, we identify the weaknesses of the current enhanced fat-tree routing algorithm in OpenFabrics Enterprise Distribution and we propose extensions to it that alleviate all performance problems related to node distribution. The new algorithm is implemented in OpenSM for real world evaluation and for future contribution to the OpenFabrics community. We demonstrate that our solution allows to achieve a predictable high throughput regardless of the number of nodes and their distribution. Furthermore, the simulations show that our extensions improve throughput up to 30% depending on topology size and node distribution.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2010 |
Conference Name | Proceedings of the 16th IEEE International Conference on Parallel and Distributed Systems |
Pagination | 381-388 |
Date Published | December |
Publisher | IEEE Computer Society |
ISBN Number | 978-0-7695-4307-9 |
First Experiences With Congestion Control in InfiniBand Hardware
In 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS). IEEE, 2010.Status: Published
First Experiences With Congestion Control in InfiniBand Hardware
In lossless interconnection networks congestion control (CC) can be an effective mechanism to achieve high performance and good utilization of network resources. Without CC, congestion on one link may grow into a congestion tree that can degrade the performance severely. This degradation can affect not only contributors to the congestion, but also throttles innocent traffic flows in the network. The InfiniBand standard describes CC functionality for detecting and resolving congestion. The InfiniBand CC concept is rich in the way that it specifies a set of parameters that can be tuned in order to achieve effective CC. There is, however, limited experience with the InfiniBand CC mechanism. To the best of our knowledge, only a few simulation studies exist. Recently, InfiniBand CC has been implemented in hardware, and in this paper we present the first experiences with such equipment. We show that the implemented InfiniBand CC mechanism effectively resolves congestion and improves fairness by solving the parking lot problem, if the CC parameters are appropriately set. By conducting extensive testing on a selection of the CC parameters, we have explored the parameter space and found a subset of parameter values that leads to efficient CC for our test scenarios. Furthermore, we show that the InfiniBand CC increases the performance of the well known HPC Challenge benchmark in a congested network.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2010 |
Conference Name | 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS) |
Pagination | 1-12 |
Publisher | IEEE |
ISBN Number | 978-1-4244-6442-5 |
DOI | 10.1109/IPDPS.2010.5470419 |
Host Side Dynamic Reconfiguration With InfiniBand
In 2010 IEEE International Conference on Cluster Computing. IEEE Computer Society, 2010.Status: Published
Host Side Dynamic Reconfiguration With InfiniBand
Rerouting around faulty components and migration of jobs both require reconfiguration of data structures in the Queue Pairs residing in the hosts on an InfiniBand cluster. In this paper we report an implementation of dynamic reconfiguration of such host side data-structures. Our implementation preserves the Queue Pairs, and lets the application run without being interrupted. With this implementation, we demonstrate a complete solution to fault tolerance in an InfiniBand network, where dynamic network reconfiguration to a topology-agnostic routing function is used to avoid malfunctioning components. This solution is in principle able to let applications run uninterruptedly on the cluster, as long as the topology is physically connected. Through measurements on our test-cluster we show that the increased cost of our method in setup latency is negligible, and that there is only a minor reduction in throughput during reconfiguration.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2010 |
Conference Name | 2010 IEEE International Conference on Cluster Computing |
Pagination | 126-135 |
Publisher | IEEE Computer Society |
ISBN Number | 978-0-7695-4220-1 |
Journal Article
Ethernet for High Performance Data Centers - on the New IEEE Data Center Bridging Standards
IEEE Micro 30 (2010): 42-51.Status: Published
Ethernet for High Performance Data Centers - on the New IEEE Data Center Bridging Standards
Ethernet is about to enter the domain of data center and high performance computing by introducing several performance optimizations that will close the performance and functionality gap between Ethernet and its fiercest competitioner InfiniBand. Through the Data Center Bridging Task Group the IEEE is about to expand the 802.1 standard with four new supplements that will both close the performance gap and make the converged network a reality. In a converged network all applications use a single physical infrastructure, e.g. Ethernet or InfiniBand. This is ideal for the next generation of data centers that are now emerging. In this paper we discuss the architectural challenges faced by Ethernet in order to improve performance and make the converged network a reality, and we present the Ethernet enhancements currently being standardized by the IEEE.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Journal Article |
Year of Publication | 2010 |
Journal | IEEE Micro |
Volume | 30 |
Number | 4 |
Pagination | 42-51 |
Date Published | July/August |
Publisher | IEEE |
Talks, contributed
First Experiences With Congestion Control in InfiniBand Hardware
In Invited talk at the HPC Advisory Council Switzerland Workshop 2010, 2010.Status: Published
First Experiences With Congestion Control in InfiniBand Hardware
In InfiniBand networks congestion control (CC) can be an effective mechanism to achieve high performance and high utilisation of network resources. Without CC, congestion in one node may severely degrade overall performance. In this talk we introduce the problem of congestion and how it can be avoided with InfiniBand congestion control.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Talks, contributed |
Year of Publication | 2010 |
Location of Talk | Invited talk at the HPC Advisory Council Switzerland Workshop 2010 |
VFtree- Fat Tree Routing With Virtual Lanes in InfiniBand
In Poster, HiPEAC ACACES, Barcelona, Spain, 2010.Status: Published
VFtree- Fat Tree Routing With Virtual Lanes in InfiniBand
Even though the bisectional bandwidth in a fat-tree is constant, hot spots are still possible. Such a situation may be alleviated through adaptive routing or congestion control, however, these methods are not fully supported in InfiniBand technology. We propose an inexpensive approach for reducing the hot-spot problem in fat-trees, which is based on the application of virtual lanes.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Talks, contributed |
Year of Publication | 2010 |
Location of Talk | Poster, HiPEAC ACACES, Barcelona, Spain |
Proceedings, refereed
A Framework for Routing and Resource Allocation in Network Virtualization
In International Conference on High Performance Computing (HiPC'09). IEEE, 2009.Status: Published
A Framework for Routing and Resource Allocation in Network Virtualization
Computer architectures for high performance computing have traditionally been based on an assumption of one parallel application running alone on one machine. The current trend is, however, that huge computer installations offer compute power to a set of users or customers, each demanding only a subset of the available compute resources. This places new requirements on the architecture, in that it must support dynamic partitioning of the resources into several virtual servers as demand changes. We introduce a novel framework which supports flexible formation of such virtual servers while preventing interference between the communication of different virtual servers. This paper investigates the impacts of a shared interconnection network on applications running on virtual compute servers. We show that the interconnect performance supplied to each job is highly unpredictable, and that a job can experience a performance degradation of 97% when its traffic interferes with the traffic of concurrent jobs. With a minor reduction in the utilization of each processing node, this can be considerably improved through a combination of routing-containment in the interconnection network and a carefully designed resource allocation strategy.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2009 |
Conference Name | International Conference on High Performance Computing (HiPC'09) |
Pagination | 129-139 |
Publisher | IEEE |
ISBN Number | 978-1-4244-4921-7 |
Book Chapter
Scalable Interconnection Networks
In Simula Research Laboratory - by thinking constantly about it, 129-162. Heidelberg: Springer, 2009.Status: Published
Scalable Interconnection Networks
A modern supercomputer or large scale server consists of a huge set of processing units and units that perform different forms of input/output and memory functions. These components unite in a complex collaboration to perform the main tasks of the system. Such collaboration requires communication between the components, which is supported by an infrastructure called the interconnection network. This book chapter describes the interconnection networks research activity at Simula the last five years done by the ICON group. ICON has focused on how to connect point-to-point links and switches into scalable network topologies, and how to route packets efficiently in order to yield the highest possible performance. This also poses various requirements regarding fault tolerance, quality of service (QoS), congestion control, virtualization, and other non-functional aspects. ICON's research results have been published in several of the most respected IEEE journals and magazines within our field. Furthermore, some of ICON's solutions have had a major impact on the routing architecture of modern supercomputers.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Book Chapter |
Year of Publication | 2009 |
Book Title | Simula Research Laboratory - by thinking constantly about it |
Chapter | 14 |
Pagination | 129-162 |
Publisher | Springer |
Place Published | Heidelberg |
ISBN Number | 978-3-642-01155-9 |
Proceedings, refereed
An Analysis of Connectivity and Yield for 2D Mesh Based NoC With Interconnect Router Failures
In 11th EUROMICRO Conference on Digital System Design (DSD). University of Parma, 2008.Status: Published
An Analysis of Connectivity and Yield for 2D Mesh Based NoC With Interconnect Router Failures
The manufacturing process of modern day processors is both costly and complex and there are many different factors that influence the quality of a chip when it comes off the production line. Typically, hundreds of chips are manufactured from a single silicon wafer and as we go deeper into the sub-micron era of microchip manufacturing, the potential for defects during production increases. The advent of multi-core computing may introduce problems related to connectivity and yield for high volume manufacturing (HVM). In this paper we explore potential benefits that fault tolerant routing provides within the NoC (network-on-chip) paradigm with a study of the relationship between connectivity and yield at the interconnect routing level. For dimension-order routing based mesh NoCs, we describe two methods that are logically straightforward to implement and that can be used to increase the yield of chips with interconnect router faults.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2008 |
Conference Name | 11th EUROMICRO Conference on Digital System Design (DSD) |
Date Published | September |
Publisher | University of Parma |
ISBN Number | 978-0-7695-3277-6 |
Dragon Kill Points: Loot Distribution in Massive Multiplayer Online Role Playing Games
In Proceedings of the 7th ACM SIGCOMM workshop on Network and system support for games (NetGames 2008). Association for Computing Machinery (ACM), 2008.Status: Published
Dragon Kill Points: Loot Distribution in Massive Multiplayer Online Role Playing Games
One of the major reasons for playing Massive Multiplayer Online Role Playing Games (MMORPGs) is the possibility to show off your abilities to other players. The more rare your equipment is, the higher is the show off value of your character. And because rare items are hard to find cooperation between several players is often required. This introduces a conflict between the players, and a way to distribute loot is necessary. We introduce the problem of loot distribution in MMORPG, and we suggest and give a preliminary evaluation of a new and improved Dragon Kill Points system.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2008 |
Conference Name | Proceedings of the 7th ACM SIGCOMM workshop on Network and system support for games (NetGames 2008) |
Pagination | 100-101 |
Date Published | October |
Publisher | Association for Computing Machinery (ACM) |
ISBN Number | 978-1-60558-132-3 |
Journal Article
The Interconnection Network - Architectural Challenges for Utility Computing Data Centres
IEEE Computer Magazine 41 (2008): 62-69.Status: Published
The Interconnection Network - Architectural Challenges for Utility Computing Data Centres
The mode of operation employed by Computational Data Centres that offer Utility Computing differs significantly from that of traditional supercomputers and server clusters and as such present new architectural problems that should be studied and solved. In this paper we concentrate on issues facing the interconnection network. We argue that this is a part of the overall architecture where shortcomings in present day solutions are most severe and present a model for the mode of operation of a Utility Computing Data Centre where virtualisation is a main ingredient. Based on this model we identify several areas where the interconnection network faces new challenges and needs new solutions. In each of these areas we give a brief introduction to previous results before we identify the new challenges.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Journal Article |
Year of Publication | 2008 |
Journal | IEEE Computer Magazine |
Volume | 41 |
Number | 9 |
Pagination | 62-69 |
Date Published | September |
Publisher | IEEE |
Proceedings, refereed
Boosting Ethernet Performance by Segment-Based Routing
In Proceedings of the 15th Euromicro Conference on Parallel, Distributed and Network-based Processing (PDP 2007). IEEE Computer Society Press, 2007.Status: Published
Boosting Ethernet Performance by Segment-Based Routing
In this paper we embed an efficient topology agnostic routing algorithm with fault tolerance capabilities into back-pressured Ethernet technology. This makes it possible to use off-the-shelf equipment to build cost-effective systems with an efficient use of all network components. This stands in contrast to the inefficient use of network resources (links) supported by the Spanning Tree Protocol (STP). The Segment-Based Routing Algorithm (SR) is a deterministic routing algorithm that achieves high performance without the use of virtual channels. Furthermore, it is topology agnostic, meaning it can handle any topology and any combination of faults derived from the original topology when combined with static reconfiguration. Through simulations we verify an overall improvement in throughput by a factor of 1.2 to 10.0 compared to the conventional Ethernet routing algorithm, the STP, and other topology agnostic routing algorithms such as Up*/Down* and Tree-based Turn-prohibition, which both are applicable to Ethernet.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2007 |
Conference Name | Proceedings of the 15th Euromicro Conference on Parallel, Distributed and Network-based Processing (PDP 2007) |
Pagination | 55-62 |
Date Published | February |
Publisher | IEEE Computer Society Press |
ISBN Number | 0-7695-2784-1 |
Effective Shortest Path Routing for Gigabit Ethernet
In Proceedings of the IEEE International Conference on Communications 2007. IEEE Communications Society, 2007.Status: Published
Effective Shortest Path Routing for Gigabit Ethernet
Since its invention at Xerox PARC in 1973, Ethernet technology has proven to be both robust and adaptable. Through several evolutionary steps Ethernet has become an almost ubiquitous communication technology, spanning from local area networking through high performance backplane interconnects (a recent initiative) to metropolitan networking. However, an obstacle still remains for Ethernet to effectively make inroads in application areas such as interconnection and backbone networks. Ethernet's native routing algorithm, the Spanning Tree Protocol, becomes a major performance and utilization bottleneck when network connectivity increases. Since the Spanning Tree Protocol avoids deadlocks and infinitely looping packets by turning any topology into tree, it leaves a large portion of links unused and thus wastes bandwidth. In this paper we address this weakness by proposing a new routing algorithm which achieves the same goals as the Spanning Tree Protocol, but without disabling any links or prohibiting any turns, and at the same time guaranteeing shortest path routing. Through the use of layered routing we show how to improve performance with respect to both the Spanning Tree Protocol and a more recent proposal called Tree-Based Turn-Prohibition. Extensive simulations show that we are able to increase throughput by a factor of more than 3.5 compared to the Spanning Tree Protocol and a factor of 1.8 compared to Tree-Based Turn-Prohibition. Our concept relies on features introduced in IEEE standards 802.1Q, 802.1D and 802.3x, as well as changes currently discussed in IEEE task forces. We also discuss backwards compatibility toghether with the changes necessary for enabling layered shortest path routing in Ethernet.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2007 |
Conference Name | Proceedings of the IEEE International Conference on Communications 2007 |
Pagination | 6419-6424 |
Date Published | June |
Publisher | IEEE Communications Society |
ISBN Number | 1-4244-0353-7 |
PhD Thesis
Quality of Service in Interconnection Networks
Faculty of Mathematics and Natural Sciences, University of Oslo, 2007.Status: Published
Quality of Service in Interconnection Networks
Interconnection networks were traditionally confined to multiprocessors where low latency and high bandwidth were necessary for interprocessor communication. But in the last decade interconnection networks have become crucial in other application areas such as storage area networks and high performance computing clusters. This development has led to an increased interest in supporting quality of service in interconnection networks driven by the wish to converge cluster storage, cluster communication, and cluster management in one single network. In this thesis we propose and study a set of mechanisms to achieve this in existing interconnection technologies. We introduce two new topology agnostic routing algorithms, a service differentiation scheme for the InfiniBand Architecture, and several admission control schemes.
Afilliation | Communication Systems, Communication Systems |
Publication Type | PhD Thesis |
Year of Publication | 2007 |
Date Published | November |
Publisher | Faculty of Mathematics and Natural Sciences, University of Oslo |
Thesis Type | phd |
Technical reports
The Interconnection Network - Architectural Challenges for Utility Computing Data Centres
Simula Research Laboratory, 2007.Status: Published
The Interconnection Network - Architectural Challenges for Utility Computing Data Centres
Afilliation | Communication Systems |
Project(s) | No Simula project |
Publication Type | Technical reports |
Year of Publication | 2007 |
Publisher | Simula Research Laboratory |
Notes | This technical report is an earlier version of a published journal article. The published version can be found here: |
Journal Article
An Overview of QoS Capabilities in InfiniBand, Advanced Switching Interconnect, and Ethernet
IEEE Communications Magazine 44 (2006): 32-38.Status: Published
An Overview of QoS Capabilities in InfiniBand, Advanced Switching Interconnect, and Ethernet
A recent trend in interconnection network technologies is the inclusion of various mechanisms to support a variety of Quality of Service concepts. This has been necessitated by an increasing number of application areas that require some level of performance guarantees from the network for parts of its traffic. In this paper we describe and compare the capabilities and support for Quality of Service of the three most important interconnection network technology standards of today. Equalities between the technologies are explained and differences are clarified.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Journal Article |
Year of Publication | 2006 |
Journal | IEEE Communications Magazine |
Volume | 44 |
Number | 7 |
Pagination | 32-38 |
Date Published | july |
Publisher | IEEE |
Layered Routing in Irregular Networks
IEEE Transactions on Parallel and Distributed Systems 17 (2006): 51-65.Status: Published
Layered Routing in Irregular Networks
Freedom from deadlock is a key issue in Cut-Through, Wormhole and Store and Forward networks, and such freedom is usually obtained through careful design of the routing algorithm. Most existing deadlock-free routing methods for irregular topologies do, however, impose severe limitations on the available routing paths. We present a method called Layered Routing, which gives rise to a series of routing algorithms, some of which perform considerably better than previous ones. Our method groups virtual channels into network layers, and to each layer it assigns a limited set of source/destination address pairs. This separation of traffic yields a significant increase in routing efficiency. We show how the method can be used to improve the performance of irregular networks, both through load balancing and by guaranteeing shortest-path routing. The method is simple to implement, and its application does not require any features in the switches other than the existence of a modest number of virtual channels. The performance of the approach is evaluated through extensive experiments within three classes of technologies. These experiments reveal a need for virtual channels as well as an improvement in throughput for each technology class.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Journal Article |
Year of Publication | 2006 |
Journal | IEEE Transactions on Parallel and Distributed Systems |
Volume | 17 |
Number | 1 |
Pagination | 51-65 |
Date Published | january |
Publisher | IEEE |
Proceedings, refereed
Segment-Based Routing: an Efficient Fault-Tolerant Routing Algorithm for Meshes and Tori
In 20th IEEE International Parallel & Distributed Processing Symposium. Washington, USA: IEEE Computer Society, 2006.Status: Published
Segment-Based Routing: an Efficient Fault-Tolerant Routing Algorithm for Meshes and Tori
Computers get faster every year, but the demand for computing resources seems to grow at an even faster rate. Science keeps demanding more processing power for calculations and simulations, growth in E-commerce requires powerful servers to offer seamless online shopping, and massive multiplayer online games requires powerful and stable systems to keep their virtual worlds running 24 hours a day. Depending on the problem domain, this demand for more power can be satisfied by either, massively parallel computers, or clusters of computer. Common for both approaches is the dependence on high performance interconnect networks such as Myrinet, Infiniband, or 10 Gigabit Ethernet. While high throughput and low latency are key features of interconnection networks, the issue of fault-tolerance is now becoming increasingly important. As the number of network components grows so does the probability for failure, thus it becomes important to also consider the fault-tolerance mechanism of interconnection networks. The main challenge then lies in combining performance and fault-tolerance, while still keeping cost and complexity low. This paper proposes a new deterministic routing methodology for tori and meshes, which achieves high performance without the use of virtual channels. Furthermore, it is topology agnostic in nature, meaning it can handle any topology derived from any combination of faults when combined with static reconfiguration. The algorithm, referred to as Segment-based Routing (SR), works by partitioning a topology into subnets, and subnets into segments. This allows us to place bidirectional turn restrictions locally within a segment. As segments are independent, we gain the freedom to place turn restrictions within a segment independently from other segments. This results in a larger degree of freedom when placing turn restrictions compared to other routing strategies. In this paper a way to compute segment-based routing tables is presented and applied to meshes and tori. Preliminary evaluation results show that the concept of segments leads to an increase in performance by a factor of 1.8 over FX and up*/down* routing.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2006 |
Conference Name | 20th IEEE International Parallel & Distributed Processing Symposium |
Pagination | 1-10 |
Date Published | April |
Publisher | IEEE Computer Society |
Place Published | Washington, USA |
ISBN Number | 1-4244-0054-6 |
Talks, contributed
The Realization of Virtual Compute Resources in a Utility Computing Data Center (UCDC) in the Many Core Era
In 2006 Workshop on On- and Off-Chip Interconnection Networks for Multicore Systems, 2006.Status: Published
The Realization of Virtual Compute Resources in a Utility Computing Data Center (UCDC) in the Many Core Era
Afilliation | Communication Systems, Communication Systems |
Publication Type | Talks, contributed |
Year of Publication | 2006 |
Location of Talk | 2006 Workshop on On- and Off-Chip Interconnection Networks for Multicore Systems |
Proceedings, refereed
Ethernet As a Lossless Deadlock Free System Area Network
In Proceedings of the International Symposium on Parallel and Distributed Processing and Applications, Nanjing, China May 2-5. Lecture Notes in Computer Science. Heidelberg, Germany: Springer-Verlag GmbH, 2005.Status: Published
Ethernet As a Lossless Deadlock Free System Area Network
The way conventional Ethernet is used today differs in two aspects from how dedicated system area networks are used. Firstly, dedicated system area networks are lossless and only drop frames when bit errors occur, while conventional Ethernet drop frames whenever congestion occur. Secondly, these networks are either deadlock free or use mechanisms which avoids deadlock situations, while still using all available links. Ethernet avoids deadlocks by using a spanning tree protocol which turns any topology into a tree. A drawback of this approach is that we are left with a lot of unused links and thus wasting resources. In this paper we describe how to obtain a lossless deadlock free network with the best possible performance, while adhering to the current Ethernet standard and using off-the-shelf Ethernet equipment. We achieve this by introducing flow control in all network nodes and by taking control over the routing algorithm. Also, we use TCP to illustrate the effect of flow control on higher layer protocols. Through simulations we verify the following tree improvements. Firstly, the activation of flow control turns Ethernet into a lossless network. Secondly, taking control over the routing algorithm allows us to build any topology without the limitations of the spanning tree protocol. And thirdly, an overall improvement in throughput is achieved by combining these enhancements.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2005 |
Conference Name | Proceedings of the International Symposium on Parallel and Distributed Processing and Applications, Nanjing, China May 2-5 |
Pagination | 901-914 |
Date Published | november |
Publisher | Springer-Verlag GmbH |
Place Published | Heidelberg, Germany |
ISBN Number | 3-540-29769-3 |
Proceedings, refereed
Achieving Flow Level QoS in Cut-Through Networks Through Admission Control and DiffServ
In Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA). Las Vegas, Nevada: CSREA Press, 2004.Status: Published
Achieving Flow Level QoS in Cut-Through Networks Through Admission Control and DiffServ
Cluster networks will serve as the future access networks for multimedia streaming, massive multiplayer online gaming, e-commerce, network storage etc. And for those application areas provisioning of Quality of Service (QoS) is becoming and important issue. DiffServ as specified by the IETF is foreseen to be the most prominent concept for providing predictability in the future Internet. To enable seamless interoperation with the higher level IETF concepts the QoS architecture of the lower layers should comply with the DiffServ paradigm as well. Previous work on predictability in cut-through networks has only studied class based QoS. In this paper we set out to achieve flow level QoS using flow aware admission control in combination with a flow negligent DiffServ inspired QoS mechanism. Our results show that flow level bandwidth guarantees are achievable with the use of the Link-by-Link and the Probe based schemes. In addition we are able to achieve an order of magnitude improvement in jitter and latency in individual flows.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2004 |
Conference Name | Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA) |
Pagination | 1084-1090 |
Date Published | June 21-24 |
Publisher | CSREA Press |
Place Published | Las Vegas, Nevada |
Proceedings, refereed
Admission Control for DiffServ Based Quality of Service in Cut-Through Networks
In Proceedings of the 10th International Conference on High Performance Computing (HiPC 2003). Lecture Notes in Computer Science. Heidelberg: Springer, 2003.Status: Published
Admission Control for DiffServ Based Quality of Service in Cut-Through Networks
Previous work on Quality of Service in Cut-through networks shows that resource reservation mechanisms are only effective below the saturation point. Admission control in these networks will therefore need to keep network utilization below the saturation point, while still utilising the network resources to the maximum extent possible. In this paper we propose and evaluate three admission control schemes. Two of these use a centralised bandwidth broker, while the third is a distributed measurement based approach. We combine these admission control schemes with a DiffServ based QoS scheme for virtual cut-through networks to achieve class based bandwidth and latency guarantees. Our simulations show that the measurement based approach favoured in the Internet communities performs poorly in cut-through networks. Furthermore it demonstrates that detailed knowledge on link utilization improves performance significantly.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2003 |
Conference Name | Proceedings of the 10th International Conference on High Performance Computing (HiPC 2003) |
Pagination | 118-129 |
Date Published | December 17-20 |
Publisher | Springer |
Place Published | Heidelberg |
Applying the DiffServ Model on Cut-Through Networks
In Proceedings of the 2003 International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA2003). Las Vegas, Nevada: CSREA Press, 2003.Status: Published
Applying the DiffServ Model on Cut-Through Networks
Understanding the nature of traffic in high-speed communication systems is essential for achieving QoS in these networks. A first step towards this goal is understanding how basic QoS mechanisms work and affects the network predictability before we introduce more complex mechanisms such as admission control. In this paper we analyse the effect of a DiffServ inspired QoS concept applied to virtual cut-through networks. The main findings from our study are that (i) throughput differentiation can be achieved by weighting of virtual lanes (VL) and by classifying VLs as either low or high priority, (ii) the balance between VL weighting and VL load is not crucial when the network is operating below the saturation point, (iii) jitter, however, is large and good jitter characteristics seems unachievable with such a relative scheme.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2003 |
Conference Name | Proceedings of the 2003 International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA2003) |
Pagination | 1089-1095 |
Date Published | June 23 - 26 |
Publisher | CSREA Press |
Place Published | Las Vegas, Nevada |
Proceedings, refereed
Topologies and Routing in Gigabit Switching Fabrics
In Proceedings of 2nd International Conference on Communications in Computing (CIC2001). Las Vegas, Nevada: CSREA Press, 2001.Status: Published
Topologies and Routing in Gigabit Switching Fabrics
Cluster networks will serve as the future access networks for multimedia streaming, massive multiplayer online gaming, e-commerce, network storage etc. And for those application areas provisioning of Quality of Service (QoS) is becoming and important issue. DiffServ as specified by the IETF is foreseen to be the most prominent concept for providing predictability in the future Internet. To enable seamless interoperation with the higher level IETF concepts the QoS architecture of the lower layers should comply with the DiffServ paradigm as well. Previous work on predictability in cut-through networks has only studied class based QoS. In this paper we set out to achieve flow level QoS using flow aware admission control in combination with a flow negligent DiffServ inspired QoS mechanism. Our results show that flow level bandwidth guarantees are achievable with the use of the Link-by-Link and the Probe based schemes. In addition we are able to achieve an order of magnitude improvement in jitter and latency in individual flows.
Afilliation | Communication Systems |
Project(s) | No Simula project |
Publication Type | Proceedings, refereed |
Year of Publication | 2001 |
Conference Name | Proceedings of 2nd International Conference on Communications in Computing (CIC2001) |
Pagination | 142-149 |
Date Published | June 25 - 28 |
Publisher | CSREA Press |
Place Published | Las Vegas, Nevada |