Publications
Journal Article
Enabling unstructured-mesh computation on massively tiled AI processors: An example of accelerating in silico cardiac simulation
Frontiers in Physics 11 (2023).Status: Published
Enabling unstructured-mesh computation on massively tiled AI processors: An example of accelerating in silico cardiac simulation
A new trend in processor architecture design is the packaging of thousands of small processor cores into a single device, where there is no device-level shared memory but each core has its own local memory. Thus, both the work and data of an application code need to be carefully distributed among the small cores, also termed as tiles. In this paper, we investigate how numerical computations that involve unstructured meshes can be efficiently parallelized and executed on a massively tiled architecture. Graphcore IPUs are chosen as the target hardware platform, to which we port an existing monodomain solver that simulates cardiac electrophysiology over realistic 3D irregular heart geometries. There are two computational kernels in this simulator, where a 3D diffusion equation is discretized over an unstructured mesh and numerically approximated by repeatedly executing sparse matrix-vector multiplications (SpMVs), whereas an individual system of ordinary differential equations (ODEs) is explicitly integrated per mesh cell. We demonstrate how a new style of programming that uses Poplar/C++ can be used to port these commonly encountered computational tasks to Graphcore IPUs. In particular, we describe a per-tile data structure that is adapted to facilitate the inter-tile data exchange needed for parallelizing the SpMVs. We also study the achievable performance of the ODE solver that heavily depends on special mathematical functions, as well as their accuracy on Graphcore IPUs. Moreover, topics related to using multiple IPUs and performance analysis are addressed. In addition to demonstrating an impressive level of performance that can be achieved by IPUs for monodomain simulation, we also provide a discussion on the generic theme of parallelizing and executing unstructured-mesh multiphysics computations on massively tiled hardware.
Afilliation | Scientific Computing |
Project(s) | Department of High Performance Computing , SparCity: An Optimization and Co-design Framework for Sparse Computation |
Publication Type | Journal Article |
Year of Publication | 2023 |
Journal | Frontiers in Physics |
Volume | 11 |
Date Published | 03/2023 |
Publisher | Frontiers |
ISSN | 2296-424X |
Keywords | hardware accelerator, heterogenous computing, irregular meshes, scientific computation, scientific computation on MIMD processors, sparse matrix-vector multiplication (SpMV) |
URL | https://www.frontiersin.org/articles/10.3389/fphy.2023.979699/full |
DOI | 10.3389/fphy.2023.979699 |
Talks, invited
ML Accelerator Hardware: A Model for Parallel Sparse Computations?
In University of Vienna, Austria, 2022.Status: Published
ML Accelerator Hardware: A Model for Parallel Sparse Computations?
Recently, dedicated accelerator hardware for machine learning applications such as the Graphcore IPUs and Cerebras WSE have evolved from the experimental state into market-ready products, and they have the potential to constitute the next major architectural shift after GPUs saw widespread adoption a decade ago. In this talk we will present the new hardware along with implementations of basic graph and matrix algorithms and show some early results on the attainable performance, as well as the difficulties of establishing fair comparisons to other architectures. We follow up by discussing the wider implications of the architecture for algorithm design and programming, along with the wider implications of adopting such hardware.
Afilliation | Scientific Computing |
Project(s) | Department of High Performance Computing |
Publication Type | Talks, invited |
Year of Publication | 2022 |
Location of Talk | University of Vienna, Austria |
ML Accelerator Hardware: A Model for Parallel Sparse Computations?
In Siam ACDA, Aussois, France. Aussois: SIAM, 2022.Status: Published
ML Accelerator Hardware: A Model for Parallel Sparse Computations?
Recently, dedicated accelerator hardware for machine learning applications such as the Graphcore IPUs and Cerebras WSE have evolved from the experimental state into market-ready products, and they have the potential to constitute the next major architectural shift after GPUs saw widespread adoption a decade ago.
In this talk we will present the new hardware along with implementations of basic graph and matrix algorithms and show some early results on the attainable performance, as well as the difficulties of establishing fair comparisons to other architectures. We follow up by discussing the wider implications of the architecture for algorithm design and programming , along with the wider implications of adopting such hardware.
Afilliation | Scientific Computing |
Project(s) | Department of High Performance Computing |
Publication Type | Talks, invited |
Year of Publication | 2022 |
Location of Talk | Siam ACDA, Aussois, France |
Publisher | SIAM |
Place Published | Aussois |
Journal Article
The connectivity network underlying the German’s Twittersphere: a testbed for investigating information spreading phenomena
Scientific Reports 12, no. 1 (2022).Status: Published
The connectivity network underlying the German’s Twittersphere: a testbed for investigating information spreading phenomena
Online social networks are ubiquitous, have billions of users, and produce large amounts of data. While platforms like Reddit are based on a forum-like organization where users gather around topics, Facebook and Twitter implement a concept in which individuals represent the primary entity of interest. This makes them natural testbeds for exploring individual behavior in large social networks. Underlying these individual-based platforms is a network whose “friend” or “follower” edges are of binary nature only and therefore do not necessarily reflect the level of acquaintance between pairs of users. In this paper,we present the network of acquaintance “strengths” underlying the German Twittersphere. To that end, we make use of the full non-verbal information contained in tweet–retweet actions to uncover the graph of social acquaintances among users, beyond pure binary edges. The social connectivity between pairs of users is weighted by keeping track of the frequency of shared content and the time elapsed between publication and sharing. Moreover, we also present a preliminary topological analysis of the German Twitter network. Finally, making the data describing the weighted German Twitter network of acquaintances, we discuss how to apply this framework as a ground basis for investigating spreading phenomena of particular contents.
Afilliation | Communication Systems |
Project(s) | UMOD: Understanding and Monitoring Digital Wildfires, Enabling Graph Neural Networks at Exascale |
Publication Type | Journal Article |
Year of Publication | 2022 |
Journal | Scientific Reports |
Volume | 12 |
Issue | 1 |
Date Published | Jan-12-2022 |
Publisher | Nature Publishing Group |
URL | https://www.nature.com/articles/s41598-022-07961-3 |
DOI | 10.1038/s41598-022-07961-3 |
Proceedings, refereed
iPUG for multiple Graphcore IPUs: Optimizing performance and scalability of parallel breadth-first search
In 28th IEEE International Conference on High Performance Computing, Data, & Analytics (HiPC). Bangalore, India: IEEE, 2021.Status: Published
iPUG for multiple Graphcore IPUs: Optimizing performance and scalability of parallel breadth-first search
Parallel graph algorithms have become one of the principal applications of high-performance computing besides numerical simulations and machine learning workloads. However, due to their highly unstructured nature, graph algorithms remain extremely challenging for most parallel systems, with large gaps between observed performance and theoretical limits. Further-more, most mainstream architectures rely heavily on single instruction multiple data (SIMD) processing for high floating-point rates, which is not beneficial for graph processing which instead requires high memory bandwidth, low memory latency, and efficient processing of unstructured data. On the other hand, we are currently observing an explosion of new hardware architectures, many of which are adapted to specific purposes and diverge from traditional designs. A notable example is the Graphcore Intelligence Processing Unit (IPU), which is developed to meet the needs of upcoming machine intelligence applications. Its design eschews the traditional cache hierarchy, relying on SRAM as its main memory instead. The result is an extremely high-bandwidth, low-latency memory at the cost of capacity. In addition, the IPU consists of a large number of independent cores, allowing for true multiple instruction multiple data (MIMD) processing. Together, these features suggest that such a processor is well suited for graph processing. We test the limits of graph processing on multiple IPUs by implementing a low-level, high-performance code for breadth-first search (BFS), following the specifications of Graph500, the most widely used benchmark for parallel graph processing. Despite the simplicity of the BFS algorithm, implementing efficient parallel codes for it has proven to be a challenging task in the past. We show that our implementation scales well on a system with 8 IPUs and attains roughly twice the performance of an equal number of NVIDIA V100 GPUs using state-of-the-art CUDA code.
Afilliation | Scientific Computing |
Project(s) | Department of High Performance Computing , SparCity: An Optimization and Co-design Framework for Sparse Computation |
Publication Type | Proceedings, refereed |
Year of Publication | 2021 |
Conference Name | 28th IEEE International Conference on High Performance Computing, Data, & Analytics (HiPC) |
Pagination | 162-171 |
Date Published | 12/2021 |
Publisher | IEEE |
Place Published | Bangalore, India |
DOI | 10.1109/HiPC53243.2021.00030 |
iPUG: Accelerating Breadth-First Graph Traversals Using Manycore Graphcore IPUs
In High Performance Computing. ISC High Performance 2021. Vol. LNCS, volume 12728. Cham: Springer International Publishing, 2021.Status: Published
iPUG: Accelerating Breadth-First Graph Traversals Using Manycore Graphcore IPUs
The Graphcore Intelligence Processing Unit (IPU) is a newly developed processor type whose architecture does not rely on the traditional caching hierarchies. Developed to meet the need for more and more data-centric applications, such as machine learning, IPUs combine a dedicated portion of SRAM with each of its numerous cores, resulting in high memory bandwidth at the price of capacity. The proximity of processor cores and memory makes the IPU a promising field of experimentation for graph algorithms since it is the unpredictable, irregular memory accesses that lead to performance losses in traditional processors with pre-caching.
This paper aims to test the IPU’s suitability for algorithms with hard-to-predict memory accesses by implementing a breadth-first search (BFS) that complies with the Graph500 specifications. Precisely because of its apparent simplicity, BFS is an established benchmark that is not only subroutine for a variety of more complex graph algorithms, but also allows comparability across a wide range of architectures.
We benchmark our IPU code on a wide range of instances and compare its performance to state-of-the-art CPU and GPU codes. The results indicate that the IPU delivers speedups of up to 4×4× over the fastest competing result on an NVIDIA V100 GPU, with typical speedups of about 1.5×1.5× on most test instances.
Afilliation | Scientific Computing |
Project(s) | Department of High Performance Computing |
Publication Type | Proceedings, refereed |
Year of Publication | 2021 |
Conference Name | High Performance Computing. ISC High Performance 2021 |
Volume | LNCS, volume 12728 |
Pagination | 291-309 |
Publisher | Springer International Publishing |
Place Published | Cham |
ISBN Number | 978-3-030-78712-7 |
ISSN Number | 0302-9743 |
Keywords | BFS, Graph500, IPU, Performance optimization |
URL | https://link.springer.com/10.1007/978-3-030-78713-4 |
DOI | 10.1007/978-3-030-78713-4 |
Proceedings, refereed
A Scalable System for Bundling Online Social Network Mining Research
In 2020 Seventh International Conference on Social Networks Analysis, Management and Security (SNAMS). IEEE, 2020.Status: Published
A Scalable System for Bundling Online Social Network Mining Research
Online social networks such as Facebook and Twitter are part of the everyday life of millions of people. They are not only used for interaction but play an essential role when it comes to information acquisition and knowledge gain. The abundance and detail of the accumulated data in these online social networks open up new possibilities for social researchers and psychologists, allowing them to study behavior in a large test population. However, complex application programming interfaces (API) and data scraping restrictions are, in many cases, a limiting factor when accessing this data. Furthermore, research projects are typically granted restricted access based on quotas. Thus, research tools such as scrapers that access social network data through an API must manage these quotas. While this is generally feasible, it becomes a problem when more than one tool, or multiple instances of the same tool, is being used in the same research group. Since different tools typically cannot balance access to a shared quota on their own, additional software is needed to prevent the individual tools from overusing the shared quota. In this paper, we present a proxy server that manages several researchers' data contingents in a cooperative research environment and thus enables a transparent view of a subset of Twitter's API. Our proxy scales linearly with the number of clients in use and incurs almost no performance penalties or implementation overhead to further layer or applications that need to work with the Twitter API. Thus, it allows seamless integration of multiple API accessing programs within the same research group.
Afilliation | Scientific Computing |
Project(s) | UMOD: Understanding and Monitoring Digital Wildfires, Department of High Performance Computing |
Publication Type | Proceedings, refereed |
Year of Publication | 2020 |
Conference Name | 2020 Seventh International Conference on Social Networks Analysis, Management and Security (SNAMS) |
Pagination | 1-6 |
Publisher | IEEE |
FakeNews: Corona Virus and 5G Conspiracy Task at MediaEval 2020
In Media Eval Challange 2020. CEUR, 2020.Status: Published
FakeNews: Corona Virus and 5G Conspiracy Task at MediaEval 2020
Afilliation | Scientific Computing, Machine Learning |
Project(s) | Department of High Performance Computing , Department of Holistic Systems, UMOD: Understanding and Monitoring Digital Wildfires |
Publication Type | Proceedings, refereed |
Year of Publication | 2020 |
Conference Name | Media Eval Challange 2020 |
Publisher | CEUR |
Resource Efficient Algorithms for Message Sampling in Online Social Networks
In The Seventh International Conference on Social Networks Analysis, Management and Security (SNAMS-2020). IEEE, 2020.Status: Published
Resource Efficient Algorithms for Message Sampling in Online Social Networks
Sampling the network structure of online social networks is a widely discussed topic as it enables a wide variety of research in computational social science and associated fields. However, analyzing and sampling contentful messages still lacks effective solutions. Previous work for retrieving messages from social networks either used endpoints that are not available to the general research community or analyzed a predefined stream of messages. Our work uses features of the Twitter API that we utilize to construct a data structure that optimizes the efficiency of requests sent to the social network. Moreover, we present a strategy for selecting users to sample, which improves the effectiveness of our query optimizing data structure by leveraging existing models of user behavior. Combining our data structure with our proposed algorithm, we can achieve a 92% sampling efficiency over long timeframes.
Afilliation | Scientific Computing |
Project(s) | UMOD: Understanding and Monitoring Digital Wildfires, Department of High Performance Computing |
Publication Type | Proceedings, refereed |
Year of Publication | 2020 |
Conference Name | The Seventh International Conference on Social Networks Analysis, Management and Security (SNAMS-2020) |
Publisher | IEEE |