Projects
User-friendly programming of GPU-enhanced clusters

By developing a simple directive-based programming model and its accompanying fully automated source-to-source code translator and domain-specific optimizer, we aim to greatly simplify the task of programming scientific codes that can run efficiently on accelerator-enhanced computer clusters. This project is motivated by an urgent need from the community of computational scientists for programming methodologies that are easy to use, while capable of harnessing especially the non-conventional computing resources, such as GPUs, that dominate today's HPC field. Based on a proof-of-concept work that has already successfully automated C-to-CUDA translation and optimization restricted to the single-GPU scenario and stencil methods, the proposed project aims to greatly enhance the success by extending to the following topics:
- improving the newly developed directive-based programming model and its accompanying framework of automated code translation and optimization
- extending to the scenario of multiple GPUs
- extending to the scenario of GPU-accelerated CPU clusters
- tackling a number of real-world scientific codes
The project has the potential of considerably enhancing the productivity of computational scientists, to let them focus more on their scientific investigations at hand, instead of spending precious time on painstakingly writing complex codes.
Funding source:
Research Council of Norway, FRINATEK program
All partners:
- Simula Research Laboratory
- University of California, San Diego (UCSD)
- San Diego Supercomputer Center (SDSC)
- National University of Defense Technology (NUDT)
- SINTEF
Publications for User-friendly programming of GPU-enhanced clusters
Journal Article
Panda: A Compiler Framework for Concurrent CPU+GPU Execution of 3D Stencil Computations on GPU-accelerated Supercomputers
International Journal of Parallel Programming (2016).Status: Published
Panda: A Compiler Framework for Concurrent CPU+GPU Execution of 3D Stencil Computations on GPU-accelerated Supercomputers
We present a new compiler framework for truly heterogeneous 3D stencil computation on GPU clusters. Our framework consists of a simple directive-based programming model and a tightly integrated source-to-source compiler. Annotated with a small number of directives, sequential stencil C codes can be automatically parallelized for large-scale GPU clusters. The most distinctive feature of the compiler is its capability to generate hybrid MPI+CUDA+OpenMP code that uses concurrent CPU+GPU computing to unleash the full potential of powerful GPU clusters. The auto-generated hybrid codes hide the overhead of various data motion by overlapping them with computation. Test results on the Titan supercomputer and the Wilkes cluster show that auto-translated codes can achieve about 90% of the performance of highly optimized handwritten codes, for both a simple stencil benchmark and a real-world application in cardiac modeling. The user-friendliness and performance of our domain-specific compiler framework allow harnessing the full power of GPU-accelerated supercomputing without painstaking coding effort.
Afilliation | Scientific Computing |
Project(s) | User-friendly programming of GPU-enhanced clusters, Center for Biomedical Computing (SFF) |
Publication Type | Journal Article |
Year of Publication | 2016 |
Journal | International Journal of Parallel Programming |
Date Published | 10/2016 |
Publisher | ACM/Springer |
Keywords | code generation, code optimisation, CPU+GPU computing, CUDA, heterogeneous computing, MPI, OpenMP, source-to-source translation, stencil computation |
DOI | 10.1007/s10766-016-0454-1 |
Accelerating Detailed Tissue-Scale 3D Cardiac Simulations Using Heterogeneous CPU-Xeon Phi Computing
International Journal of Parallel Programming (2016): 1-23.Status: Published
Accelerating Detailed Tissue-Scale 3D Cardiac Simulations Using Heterogeneous CPU-Xeon Phi Computing
We investigate heterogeneous computing, which involves both multicore CPUs and manycore Xeon Phi coprocessors, as a new strategy for computational cardiology. In particular, 3D tissues of the human cardiac ventricle are studied with a physiologically realistic model that has 10,000 calcium release units per cell and 100 ryanodine receptors per release unit, together with tissue-scale simulations of the electrical activity and calcium handling. In order to attain resource-efficient use of heterogeneous computing systems that consist of both CPUs and Xeon Phis, we first direct the coding effort at ensuring good performance on the two types of compute devices individually. Although SIMD code vectorization is the main theme of performance programming, the actual implementation details differ considerably between CPU and Xeon Phi. Moreover, in addition to combined OpenMP+MPI programming, a suitable division of the cells between the CPUs and Xeon Phis is important for resource-efficient usage of an entire heterogeneous system. Numerical experiments show that good resource utilization is indeed achieved and that such a heterogeneous simulator paves the way for ultimately understanding the mechanisms of arrhythmia. The uncovered good programming practices can be used by computational scientists who want to adopt similar heterogeneous hardware platforms for a wide variety of applications.
Afilliation | Scientific Computing |
Project(s) | User-friendly programming of GPU-enhanced clusters, Center for Biomedical Computing (SFF) |
Publication Type | Journal Article |
Year of Publication | 2016 |
Journal | International Journal of Parallel Programming |
Pagination | 1-23 |
Date Published | 10/2016 |
Publisher | ACM/Springer |
Keywords | Calcium handling, multiscale cardiac tissue simulation, supercomputing, Xeon Phi |
DOI | 10.1007/s10766-016-0461-2 |
Proceedings, refereed
Enabling Tissue-Scale Cardiac Simulations Using Heterogeneous Computing on Tianhe-2
In IEEE 22nd International Conference on Parallel and Distributed Systems (ICPADS). ACM/IEEE, 2016.Status: Published
Enabling Tissue-Scale Cardiac Simulations Using Heterogeneous Computing on Tianhe-2
We develop a simulator for 3D tissue of the human cardiac ventricle with a physiologically realistic cell model and deploy it on the supercomputer Tianhe-2. In order to attain the full performance of the heterogeneous CPU-Xeon Phi design, we use carefully optimized codes for both devices and combine them to obtain suitable load balancing. Using a large number of nodes, we are able to perform tissue-scale simulations of the electrical activity and calcium handling in millions of cells, at a level of detail that tracks the states of trillions of ryanodine receptors. We can thus simulate arrythmogenic spiral waves and other complex arrhythmogenic patterns which arise from calcium handling deficiencies in human cardiac ventricle tissue. Due to extensive code tuning and parallelization via OpenMP, MPI, and SCIF/COI, large scale simulations of 10 heartbeats can be performed in a matter of hours. Test results indicate excellent scalability, thus paving the way for detailed whole-heart simulations in future generations of leadership class supercomputers.
Afilliation | Scientific Computing |
Project(s) | User-friendly programming of GPU-enhanced clusters, Center for Biomedical Computing (SFF) |
Publication Type | Proceedings, refereed |
Year of Publication | 2016 |
Conference Name | IEEE 22nd International Conference on Parallel and Distributed Systems (ICPADS) |
Pagination | 843-852 |
Date Published | 12/2016 |
Publisher | ACM/IEEE |
ISSN Number | 1521-9097 |
Keywords | Calcium handling, multiscale cardiac tissue simulation, supercomputing, Xeon Phi |
DOI | 10.1109/ICPADS.2016.0114 |
Talks, invited
Heterogeneous HPC solutions in cardiac electrophysiology
In Lawrence Berkeley National Laboratory, Berkeley, CA, USA, 2016.Status: Published
Heterogeneous HPC solutions in cardiac electrophysiology
Detailed simulations of electrical signal transmission in the human heart require immense processing power, thereby creating the need for large scale parallel implementations. We present two heterogeneous codes solving such problems, focusing on the interaction between OpenMP, MPI, and CUDA in irregular computations, and discuss practical experiences on different supercomputers.
Afilliation | Scientific Computing |
Project(s) | User-friendly programming of GPU-enhanced clusters, Center for Biomedical Computing (SFF) |
Publication Type | Talks, invited |
Year of Publication | 2016 |
Location of Talk | Lawrence Berkeley National Laboratory, Berkeley, CA, USA |
Meeting Exascale Computing with Source-to-Source Compilers

Future computing platforms are expected to be heterogeneous in architecture, that is, consisting of conventional CPUs and powerful hardware accelerators. The hardware heterogeneity, combined with the huge scale of these future platforms, will make the task of programming extremely difficult.
To overcome the programming challenge for the important class of scientific computations that are based on meshes, this project aims to develop two fully automated source-to-source compilers. These two compilers will help computational scientists to quickly prepare implementations of, respectively, implicit and explicit mesh-based computations for truly-heterogeneous and resource-efficient execution on CPU+accelerator computing platforms. Two real-world simulators from computational cardiology will be used as testbeds of the fully automated compilers.
The success of such real-world heterogeneous simulations will not only verify the usefulness of the source-to-source compilers, but more importantly will allow unprecedented resolution and fidelity when investigating the particular topics of heart failure and arrhythmia.
Funding source
The Research Council og Norway (IKTPLUSS)
Partners
- University of California, San Diego
- Imperial College London
- Oslo University Hospital
Publications for Meeting Exascale Computing with Source-to-Source Compilers
Journal Article
On memory traffic and optimisations for low-order finite element assembly algorithms on multi-core CPUs
ACM Transactions on Mathematical Software 48, no. 2 (2022): 1-31.Status: Published
On memory traffic and optimisations for low-order finite element assembly algorithms on multi-core CPUs
Motivated by the wish to understand the achievable performance of finite element assembly on unstructured computational meshes, we dissect the standard cellwise assembly algorithm into four kernels, two of which are dominated by irregular memory traffic. Several optimisation schemes are studied together with associated lower and upper bounds on the estimated memory traffic volume. Apart from properly reordering the mesh entities, the two most significant optimisations include adopting a lookup table in adding element matrices or vectors to their global counterparts, and using a row-wise assembly algorithm for multi-threaded parallelisation. Rigorous benchmarking shows that, due to the various optimisations, the actual volumes of memory traffic are in many cases very close to the estimated lower bounds. These results confirm the effectiveness of the optimisations, while also providing a recipe for developing efficient software for finite element assembly.
Afilliation | Scientific Computing |
Project(s) | Meeting Exascale Computing with Source-to-Source Compilers, Department of High Performance Computing |
Publication Type | Journal Article |
Year of Publication | 2022 |
Journal | ACM Transactions on Mathematical Software |
Volume | 48 |
Issue | 2 |
Number | 19 |
Pagination | 1–31 |
Date Published | 05/2022 |
Publisher | Association for Computing Machinery (ACM) |
ISSN | 0098-3500 |
DOI | 10.1145/3503925 |
PhD Thesis
High-performance finite element computations: Performance modelling, optimisation, GPU acceleration & automated code generation
In University of Oslo. Vol. PhD. Oslo, Norway: University of Oslo, 2021.Status: Published
High-performance finite element computations: Performance modelling, optimisation, GPU acceleration & automated code generation
Computer experiments have become a valuable tool for investigating various physical and biological processes described by partial differential equations (PDEs), such as weather forecasting or modelling the mechanical behaviour of cardiac tissue. Finite element methods are a class of numerical methods for solving PDEs that are often preferred, but these methods are rather difficult to implement correctly, let alone efficiently.
This thesis investigates the performance of several key computational kernels involved in finite element methods. First, a performance model is developed to better understand sparse matrix-vector multiplication, which is central to solving linear systems of equations that arise during finite element calculations. Second, the process of assembling linear systems is considered through careful benchmarking and analysis of the memory traffic involved. This results in clear guidelines for finite element assembly on shared-memory multicore CPUs.
Finally, hardware accelerators are incorporated by extending the FEniCS PDE solver framework to carry out assembly and solution of linear systems on a graphics processing unit (GPU). Example problems show that GPU-accelerated finite element solvers can exhibit substantial speedup over optimised multicore CPU codes. Moreover, the use of automated code generation makes these techniques much more accessible to domain scientists and non-experts.
Afilliation | Scientific Computing |
Project(s) | Meeting Exascale Computing with Source-to-Source Compilers |
Publication Type | PhD Thesis |
Year of Publication | 2021 |
Degree awarding institution | University of Oslo |
Degree | PhD |
Number of Pages | 132 |
Date Published | 09/2020 |
Publisher | University of Oslo |
Place Published | Oslo, Norway |
Other Numbers | ISSN 1501-7710 |
Poster
Automated Code Generation for GPU-Based Finite Element Computations in FEniCS
SIAM Conference on Computational Science and Engineering (CSE21): SIAM, 2021.Status: Published
Automated Code Generation for GPU-Based Finite Element Computations in FEniCS
Developing high-performance finite element codes normally requires hand-crafting and fine tuning of computational kernels, which is not an easy task to carry out for each and every problem. Automated code generation has proved to be a highly productive alternative for frameworks like FEniCS, where a compiler is used to automatically generate suitable kernels from high-level mathematical descriptions of finite element problems. This strategy has so far enabled users to develop and run a variety of high-performance finite element solvers on clusters of multicore CPUs. We have recently enhanced FEniCS with GPU acceleration by enabling its internal compiler to generate CUDA kernels that are needed to offload finite element calculations to GPUs, particularly the assembly of linear systems. This poster presents the results of GPU-accelerating FEniCS and explores performance characteristics of auto-generated CUDA kernels and GPU-based assembly of linear systems for finite element methods.
Afilliation | Scientific Computing |
Project(s) | Meeting Exascale Computing with Source-to-Source Compilers |
Publication Type | Poster |
Year of Publication | 2021 |
Date Published | 03/2021 |
Publisher | SIAM |
Place Published | SIAM Conference on Computational Science and Engineering (CSE21) |
Journal Article
Cache simulation for irregular memory traffic on multi-core CPUs: Case study on performance models for sparse matrix–vector multiplication
Journal of Parallel and Distributed Computing 144 (2020): 189-205.Status: Published
Cache simulation for irregular memory traffic on multi-core CPUs: Case study on performance models for sparse matrix–vector multiplication
Parallel computations with irregular memory access patterns are often limited by the memory subsystems of multi-core CPUs, though it can be difficult to pinpoint and quantify performance bottlenecks precisely. We present a method for estimating volumes of data traffic caused by irregular, parallel computations on multi-core CPUs with memory hierarchies containing both private and shared caches. Further, we describe a performance model based on these estimates that applies to bandwidth-limited computations. As a case study, we consider two standard algorithms for sparse matrix–vector multiplication, a widely used, irregular kernel. Using three different multi-core CPU systems and a set of matrices that induce a range of irregular memory access patterns, we demonstrate that our cache simulation combined with the proposed performance model accurately quantifies performance bottlenecks that would not be detected using standard best- or worst-case estimates of the data traffic volume.
Afilliation | Scientific Computing |
Project(s) | Meeting Exascale Computing with Source-to-Source Compilers, Department of High Performance Computing |
Publication Type | Journal Article |
Year of Publication | 2020 |
Journal | Journal of Parallel and Distributed Computing |
Volume | 144 |
Pagination | 189--205 |
Date Published | 06/2020 |
Publisher | Elsevier |
ISSN | 0743-7315 |
Keywords | AMD Epyc, Cache simulation, Intel Xeon, Performance model, Sparse matrix–vector multiplication |
URL | http://www.sciencedirect.com/science/article/pii/S0743731520302999 |
DOI | 10.1016/j.jpdc.2020.05.020 |
Poster
Towards detailed Organ-Scale Simulations in Cardiac Electrophysiology
GPU Technology Conference (GTC), Silicon Valley, San Jose, USA, 2020.Status: Published
Towards detailed Organ-Scale Simulations in Cardiac Electrophysiology
Afilliation | Scientific Computing |
Project(s) | Meeting Exascale Computing with Source-to-Source Compilers, Department of High Performance Computing |
Publication Type | Poster |
Year of Publication | 2020 |
Place Published | GPU Technology Conference (GTC), Silicon Valley, San Jose, USA |
Type of Work | Poster |
Proceedings, refereed
Karp-Sipser based Kernels for Bipartite Graph Matching
In Algorithm Engineering and Experiment (ALENEX). Society for Industrial and Applied Mathematics, 2020.Status: Published
Karp-Sipser based Kernels for Bipartite Graph Matching
We consider Karp–Sipser, a well known matching heuristic in the context of data reduction for the max- imum cardinality matching problem. We describe an efficient implementation as well as modifications to reduce its time complexity in worst case instances, both in theory and in practical cases. We compare experimentally against its widely used simpler variant and show cases for which the full algorithm yields better performance.
Afilliation | Scientific Computing |
Project(s) | Meeting Exascale Computing with Source-to-Source Compilers, UMOD: Understanding and Monitoring Digital Wildfires, Department of High Performance Computing |
Publication Type | Proceedings, refereed |
Year of Publication | 2020 |
Conference Name | Algorithm Engineering and Experiment (ALENEX) |
Pagination | 134-145 |
Publisher | Society for Industrial and Applied Mathematics |
Journal Article
Performance optimization and modeling of fine-grained irregular communication in UPC
Scientific Programming 2019 (2019): Article ID 6825728.Status: Published
Performance optimization and modeling of fine-grained irregular communication in UPC
The UPC programming language offers parallelism via logically partitioned shared memory, which typically spans physically disjoint memory sub-systems. One convenient feature of UPC is its ability to automatically execute between-thread data movement, such that the entire content of a shared data array appears to be freely accessible by all the threads. The programmer friendliness, however, can come at the cost of substantial performance penalties. This is especially true when indirectly indexing the elements of a shared array, for which the induced between-thread data communication can be irregular and have a fine-grained pattern. In this paper we study performance enhancement strategies specifically targeting such fine-grained irregular communication in UPC. Starting from explicit thread privatization, continuing with block-wise communication, and arriving at message condensing and consolidation, we obtained considerable performance improvement of UPC programs that originally require fine-grained irregular communication. Besides the performance enhancement strategies, the main contribution of the present paper is to propose performance models for the different scenarios, in form of quantifiable formulas that hinge on the actual volumes of various data movements plus a small number of easily obtainable hardware characteristic parameters. These performance models help to verify the enhancements obtained, while also providing insightful predictions of similar parallel implementations, not limited to UPC, that also involve between-thread or between-process irregular communication. As a further validation, we also apply our performance modeling methodology and hardware characteristic parameters to an existing UPC code for solving a 2D heat equation on a uniform mesh.
Afilliation | Scientific Computing |
Project(s) | PREAPP: PRoductivity and Energy-efficiency through Abstraction-based Parallel Programming , Meeting Exascale Computing with Source-to-Source Compilers |
Publication Type | Journal Article |
Year of Publication | 2019 |
Journal | Scientific Programming |
Volume | 2019 |
Pagination | Article ID 6825728 |
Date Published | 03/2019 |
Publisher | Hindawi |
Keywords | Fine-grained irregular communication, performance modeling, Performance optimization, Sparse matrix-vector multiplication, UPC programming language |
URL | https://www.hindawi.com/journals/sp/2019/6825728/ |
DOI | 10.1155/2019/6825728 |
Poster
Towards Detailed Real-Time Simulations of Cardiac Arrhythmia
International Conference in Computing in Cardiology, Singapore, 2019.Status: Published
Towards Detailed Real-Time Simulations of Cardiac Arrhythmia
Recent advances in personalized arrhythmia risk prediction show that computational models can provide not only safer but also more accurate results than invasive procedures. However, biophysically accurate simulations require solving linear systems over fine meshes and time resolutions, which can take hours or even days. This limits the use of such simulations in the clinic where diagnosis and treatment planning can be time sensitive, even if it is just for the reason of operation schedules. Furthermore, the non-interactive, non-intuitive way of accessing simulations and their results makes it hard to study these collaboratively.
Overcoming these limitations requires speeding up computations from hours to seconds, which requires a massive increase in computational capabilities.
Fortunately, the cost of computing has fallen dramatically in the past decade. A prominent reason for this is the recent introduction of manycore processors such as GPUs, which by now power the majority of the world’s leading supercomputers. These devices owe their success to the fact that they are optimized for massively parallel workloads, such as applying similar ODE kernel computations to millions of mesh elements in scientific computing applications. Unlike CPUs, which are typically optimized for sequential performance, this allows GPU architectures to dedicate more transistors to performing computations, thereby increasing parallel speed and energy efficiency.
In this poster, we present ongoing work on the parallelization of finite volume computations over an unstructured mesh as well as the challenges involved in building scalable simulation codes and discuss the steps needed to close the gap to accurate real-time computations.
Afilliation | Scientific Computing |
Project(s) | Meeting Exascale Computing with Source-to-Source Compilers, Department of High Performance Computing |
Publication Type | Poster |
Year of Publication | 2019 |
Date Published | 09/2019 |
Place Published | International Conference in Computing in Cardiology, Singapore |
Proceedings, refereed
Towards Detailed Real-Time Simulations of Cardiac Arrhythmia
In Computing in Cardiology. Vol. 46. IEEE, 2019.Status: Published
Towards Detailed Real-Time Simulations of Cardiac Arrhythmia
Recent advances in personalized arrhythmia risk pre- diction show that computational models can provide not only safer but also more accurate results than invasive pro- cedures. However, biophysically accurate simulations re- quire solving linear systems over fine meshes and time res- olutions, which can take hours or even days. This limits the use of such simulations in the clinic where diagnosis and treatment planning can be time sensitive, even if it is just for the reason of operation schedules. Furthermore, the non-interactive, non-intuitive way of accessing simula- tions and their results makes it hard to study these collab- oratively. Overcoming these limitations requires speeding up computations from hours to seconds, which requires a massive increase in computational capabilities.
Fortunately, the cost of computing has fallen dramati- cally in the past decade. A prominent reason for this is the recent introduction of manycore processors such as GPUs, which by now power the majority of the world’s leading supercomputers. These devices owe their success to the fact that they are optimized for massively parallel work- loads, such as applying similar ODE kernel computations to millions of mesh elements in scientific computing ap- plications. Unlike CPUs, which are typically optimized for sequential performance, this allows GPU architectures to dedicate more transistors to performing computations, thereby increasing parallel speed and energy efficiency.
Afilliation | Scientific Computing |
Project(s) | Meeting Exascale Computing with Source-to-Source Compilers |
Publication Type | Proceedings, refereed |
Year of Publication | 2019 |
Conference Name | Computing in Cardiology |
Volume | 46 |
Date Published | 12/2019 |
Publisher | IEEE |
Combining algorithmic rethinking and AVX-512 intrinsics for efficient simulation of subcellular calcium signaling
In International Conference on Computational Science (ICCS 2019). Springer, 2019.Status: Published
Combining algorithmic rethinking and AVX-512 intrinsics for efficient simulation of subcellular calcium signaling
Calcium signaling is vital for the contraction of the heart. Physiologically realistic simulation of this subcellular process requires nanometer resolutions and a complicated mathematical model of differential equations. Since the subcellular space is composed of several irregularly-shaped and intricately-connected physiological domains with distinct properties, one particular challenge is to correctly compute the diffusion-induced calcium fluxes between the physiological domains. The common approach is to pre-calculate the effective diffusion coefficients between all pairs of neighboring computational voxels, and store them in large arrays. Such a strategy avoids complicated if-tests when looping through the computational mesh, but suffers from substantial memory overhead. In this paper, we adopt a memory-efficient strategy that uses a small lookup table of diffusion coefficients. The memory footprint and traffic are both drastically reduced, while also avoiding the if-tests. However, the new strategy induces more instructions on the processor level. To offset this potential performance pitfall, we use AVX-512 intrinsics to effectively vectorize the code. Performance measurements on a Knights Landing processor and a quad-socket Skylake server show a clear performance advantage of the manually vectorized implementation that uses lookup tables, over the counterpart using coefficient arrays.
Afilliation | Scientific Computing |
Project(s) | Meeting Exascale Computing with Source-to-Source Compilers, Department of High Performance Computing |
Publication Type | Proceedings, refereed |
Year of Publication | 2019 |
Conference Name | International Conference on Computational Science (ICCS 2019) |
Pagination | 681-687 |
Publisher | Springer |
DOI | 10.1007/978-3-030-22750-0_66 |
Enabling Graph Neural Networks at Exascale

Graph neural networks (GNNs), which extend the successful ideas of deep learning to irregularly structured data, are a recent addition to the field of artificial intelligence. While traditional deep learning has focused on regular inputs such as images composed of pixels in two-dimensional space, graph neural networks can analyze and learn from unstructured connections between objects. This gives GNNs the ability to tackle completely new classes of problems, such as analyzing social networks and power grids or uncovering molecule structures in computational chemistry. Some experts in the field also believe that graph networks, due to their capacity for combinatorial generalization, represent an important next step towards the development of general artificial intelligence. However, such tasks require vast amounts of computation, which can only be provided by parallel processing.
It is well known that parallel computation for irregular problems is much more challenging than for regular ones, and GNNs are no exception. While traditional deep learning has been scaled up to run on entire supercomputers efficiently, GNNs currently do not scale to multiple processors. This proposal aims to overcome this limitation by drawing upon decades of experience in scalable graph algorithms and sparse linear algebra and adapting techniques that are proven to be effective for distributing graph computations over large parallel systems to GNNs.
We aim to create a new computational framework that allows users to specify a GNN while the framework handles the task of distributing graphs over parallel machines, as well as selecting and running the algorithms that are best suited for the computation automatically. Recently, frameworks such as TensorFlow have made traditional deep neural networks accessible for a large number of users. In the same way, our goal is to create a proof-of-concept framework that will be a crucial factor to the successful GNNs real-world appliance.
Partners
- The University of Bergen
Publications for Enabling Graph Neural Networks at Exascale
Journal Article
The connectivity network underlying the German’s Twittersphere: a testbed for investigating information spreading phenomena
Scientific Reports 12, no. 1 (2022).Status: Published
The connectivity network underlying the German’s Twittersphere: a testbed for investigating information spreading phenomena
Online social networks are ubiquitous, have billions of users, and produce large amounts of data. While platforms like Reddit are based on a forum-like organization where users gather around topics, Facebook and Twitter implement a concept in which individuals represent the primary entity of interest. This makes them natural testbeds for exploring individual behavior in large social networks. Underlying these individual-based platforms is a network whose “friend” or “follower” edges are of binary nature only and therefore do not necessarily reflect the level of acquaintance between pairs of users. In this paper,we present the network of acquaintance “strengths” underlying the German Twittersphere. To that end, we make use of the full non-verbal information contained in tweet–retweet actions to uncover the graph of social acquaintances among users, beyond pure binary edges. The social connectivity between pairs of users is weighted by keeping track of the frequency of shared content and the time elapsed between publication and sharing. Moreover, we also present a preliminary topological analysis of the German Twitter network. Finally, making the data describing the weighted German Twitter network of acquaintances, we discuss how to apply this framework as a ground basis for investigating spreading phenomena of particular contents.
Afilliation | Communication Systems |
Project(s) | UMOD: Understanding and Monitoring Digital Wildfires, Enabling Graph Neural Networks at Exascale |
Publication Type | Journal Article |
Year of Publication | 2022 |
Journal | Scientific Reports |
Volume | 12 |
Issue | 1 |
Date Published | Jan-12-2022 |
Publisher | Nature Publishing Group |
URL | https://www.nature.com/articles/s41598-022-07961-3 |
DOI | 10.1038/s41598-022-07961-3 |
Proceedings, refereed
A Streaming System for Large-scale Temporal Graph Mining of Reddit Data
In 2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). Lyon, France: IEEE, 2022.Status: Published
A Streaming System for Large-scale Temporal Graph Mining of Reddit Data
Afilliation | Scientific Computing, Machine Learning |
Project(s) | Department of High Performance Computing , Enabling Graph Neural Networks at Exascale |
Publication Type | Proceedings, refereed |
Year of Publication | 2022 |
Conference Name | 2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) |
Pagination | 1153-1162 |
Publisher | IEEE |
Place Published | Lyon, France |
URL | https://ieeexplore.ieee.org/document/9835250/http://xplorestaging.ieee.o... |
DOI | 10.1109/IPDPSW55747.2022.00189 |
Proceedings, refereed
Incremental Clustering Algorithms for Massive Dynamic Graphs
In International Conference on Data Mining Workshops (ICDMW). Auckland, New Zealand : IEEE, 2021.Status: Published
Incremental Clustering Algorithms for Massive Dynamic Graphs
We consider the problem of incremental graph clustering where the graph to be clustered is given as a sequence of disjoint subsets of the edge set. The problem appears when dealing with graphs that are created over time, such as online social networks where new users appear continuously, or protein interaction networks when new proteins are discovered. For very large graphs, it is computationally too expensive to repeatedly apply standard clustering algorithms. Instead, algorithms whose time complexity only depends on the size of the incoming subset of edges in every step are needed. At the same time, such algorithms should find clusterings whose quality is close to that produced by offline algorithms. In this paper, we discuss the computational model and present an incremental clustering algorithm. We test the algorithm performance and quality on a wide variety of instances. Our results show that the algorithm far outperforms offline algorithms while retaining a large fraction of their clustering quality.
Afilliation | Scientific Computing, Machine Learning |
Project(s) | Department of High Performance Computing , Enabling Graph Neural Networks at Exascale |
Publication Type | Proceedings, refereed |
Year of Publication | 2021 |
Conference Name | International Conference on Data Mining Workshops (ICDMW) |
Pagination | 360-369 |
Date Published | 12/2021 |
Publisher | IEEE |
Place Published | Auckland, New Zealand |
ISBN Number | 978-1-6654-2427-1 |
ISSN Number | 2375-9259 |
URL | https://ieeexplore.ieee.org/abstract/document/9679843 |
DOI | 10.1109/ICDMW53433.2021.00051 |
Talk, keynote
Explaining News Spreading Phenomena in Social Networks
In Händlerlogo BI Norwegian Business School, 2021.Status: Published
Explaining News Spreading Phenomena in Social Networks
Digital wildfires are fast spreading online misinformation phenomena with the potential to cause harm in the physical world. They have been identified as a considerable risk to developed societies which raised the need to better understand online misinformation phenomena to mitigate that risk. We approach the problem from an interdisciplinary angle with the aim of using large scale analysis of social network data to test hypotheses about the behavior of social network users interacting with misinformation. We discuss state of the art techniques for capturing large volumes of communication data from social networks such as Twitter as well as collections of news such as GDELT. Based on that we describe new methods on how the reach as well as the typical target audience of media and social network participants can be measured. Doing so allows the testing of hypotheses such as the existence of filter bubbles through the use of large amounts of real-world data. Finally we discuss how the detection of anomalies in the typical news spreading patterns can be used to detect disinformation campaigns and digital wildfires.
Afilliation | Communication Systems |
Project(s) | Enabling Graph Neural Networks at Exascale, UMOD: Understanding and Monitoring Digital Wildfires |
Publication Type | Talk, keynote |
Year of Publication | 2021 |
Location of Talk | Händlerlogo BI Norwegian Business School |
UMOD: Understanding and Monitoring Digital Wildfires
In the recent years, digital wildfires, i.e. fast-spreading online misinformation have been identified as a considerable risk to developed societies, which raised the need for strategies to alleviate that risk. However, due to the speed with which online information spreads today, in combination with its immense volume, human monitoring of the Internet is completely infeasible, which gives rise to the need for an automated system.
Our project aims to develop improved prevention and preparedness techniques to counteract this type of misinformation. While several approaches have been developed in the recent past, almost all of these attempts attack the problem purely from the technical side, generally using machine-learning techniques. Our approach differs in that we study the problem from both sides, from the technical, but also form the human side by performing experiments and interviews aimed at understanding how people assess trustworthiness online, which content is likely to spread far, and why actors spread misinformation.
The five main objectives of UMOD
- Develop a computer program capable of detecting the topic of online news articles and the relationship between them.
- Perform experiments on how people assess the truthfulness of news items, and interview journalists on current fact-checking practices.
- Develop algorithms capable of analysing how news propagates from the original source.
- Using discourse analysis, we will analyse the content and the agendas of misinformation discovered by the automated system and assess its threat potential.
- Formulate detailed recommendation on how to best prepare for digital wildfires based on knowledge gathered in the four preceding points, and train the system to detect harmful misinformation early.
Final goal
The overall objective is the prevention of digital wildfires via automated early warnings from the system, as well as enhanced preparedness for such events through intense study on the spread of such wildfires and the underlying reasons of the phenomenon.
Funding source
Detailed info on RCN funding UMOD - Forskningsbanken
Project leader
Johannes Langguth
Project partners
Publications for UMOD: Understanding and Monitoring Digital Wildfires
Journal Article
COVID-19 and 5G conspiracy theories: Long term observation of a digital wildfire
International Journal of Data Science and Analytics (2022).Status: Published
COVID-19 and 5G conspiracy theories: Long term observation of a digital wildfire
Afilliation | Scientific Computing |
Project(s) | UMOD: Understanding and Monitoring Digital Wildfires |
Publication Type | Journal Article |
Year of Publication | 2022 |
Journal | International Journal of Data Science and Analytics |
Publisher | Springer |
The connectivity network underlying the German’s Twittersphere: a testbed for investigating information spreading phenomena
Scientific Reports 12, no. 1 (2022).Status: Published
The connectivity network underlying the German’s Twittersphere: a testbed for investigating information spreading phenomena
Online social networks are ubiquitous, have billions of users, and produce large amounts of data. While platforms like Reddit are based on a forum-like organization where users gather around topics, Facebook and Twitter implement a concept in which individuals represent the primary entity of interest. This makes them natural testbeds for exploring individual behavior in large social networks. Underlying these individual-based platforms is a network whose “friend” or “follower” edges are of binary nature only and therefore do not necessarily reflect the level of acquaintance between pairs of users. In this paper,we present the network of acquaintance “strengths” underlying the German Twittersphere. To that end, we make use of the full non-verbal information contained in tweet–retweet actions to uncover the graph of social acquaintances among users, beyond pure binary edges. The social connectivity between pairs of users is weighted by keeping track of the frequency of shared content and the time elapsed between publication and sharing. Moreover, we also present a preliminary topological analysis of the German Twitter network. Finally, making the data describing the weighted German Twitter network of acquaintances, we discuss how to apply this framework as a ground basis for investigating spreading phenomena of particular contents.
Afilliation | Communication Systems |
Project(s) | UMOD: Understanding and Monitoring Digital Wildfires, Enabling Graph Neural Networks at Exascale |
Publication Type | Journal Article |
Year of Publication | 2022 |
Journal | Scientific Reports |
Volume | 12 |
Issue | 1 |
Date Published | Jan-12-2022 |
Publisher | Nature Publishing Group |
URL | https://www.nature.com/articles/s41598-022-07961-3 |
DOI | 10.1038/s41598-022-07961-3 |
Journal Article
Don't Trust Your Eyes: Image Manipulation in the Age of DeepFakes
Frontiers in Communication 6 (2021).Status: Published
Don't Trust Your Eyes: Image Manipulation in the Age of DeepFakes
Afilliation | Scientific Computing |
Project(s) | UMOD: Understanding and Monitoring Digital Wildfires |
Publication Type | Journal Article |
Year of Publication | 2021 |
Journal | Frontiers in Communication |
Volume | 6 |
Publisher | Frontiers Media SA |
Place Published | Lausanne, Switzerland |
URL | https://www.frontiersin.org/articles/10.3389/fcomm.2021.632317/full |
DOI | 10.3389/fcomm.2021.632317 |
Blinded by emotions: The association between emotional reactivity and trust in fictitious news stories on crime
Studia Psychologica: International Journal for Research and Theory in Psychological Sciences 63, no. 4 (2021): 404-416.Status: Published
Blinded by emotions: The association between emotional reactivity and trust in fictitious news stories on crime
Afilliation | Scientific Computing |
Project(s) | UMOD: Understanding and Monitoring Digital Wildfires |
Publication Type | Journal Article |
Year of Publication | 2021 |
Journal | Studia Psychologica: International Journal for Research and Theory in Psychological Sciences |
Volume | 63 |
Issue | 4 |
Pagination | 404-416 |
Date Published | 12/2021 |
Publisher | Institute of Experimental Psychology, Centre of Social and Psychological Sciences |
DOI | 10.31577/sp.2021.04.833 |
What should I trust? Individual differences in attitudes to conflicting information and misinformation on COVID-19
Frontiers in Psychology 12 (2021).Status: Published
What should I trust? Individual differences in attitudes to conflicting information and misinformation on COVID-19
Afilliation | Scientific Computing |
Project(s) | UMOD: Understanding and Monitoring Digital Wildfires |
Publication Type | Journal Article |
Year of Publication | 2021 |
Journal | Frontiers in Psychology |
Volume | 12 |
Publisher | Frontiers Media SA |
URL | https://www.frontiersin.org/articles/10.3389/fpsyg.2021.588478/full |
DOI | 10.3389/fpsyg.2021.588478 |
Cognitive predictors of precautionary behaviour during the COVID-19 pandemic
Frontiers in Psychology 12 (2021).Status: Published
Cognitive predictors of precautionary behaviour during the COVID-19 pandemic
Afilliation | Scientific Computing |
Project(s) | UMOD: Understanding and Monitoring Digital Wildfires |
Publication Type | Journal Article |
Year of Publication | 2021 |
Journal | Frontiers in Psychology |
Volume | 12 |
Publisher | Frontiers Media SA |
URL | https://www.frontiersin.org/articles/10.3389/fpsyg.2021.589800/full |
DOI | 10.3389/fpsyg.2021.589800 |
An alternative correct answer to the Cognitive Reflection Test
Frontiers in Psychology 12 (2021).Status: Published
An alternative correct answer to the Cognitive Reflection Test
Afilliation | Scientific Computing |
Project(s) | UMOD: Understanding and Monitoring Digital Wildfires |
Publication Type | Journal Article |
Year of Publication | 2021 |
Journal | Frontiers in Psychology |
Volume | 12 |
Publisher | Frontiers Media SA |
URL | https://www.frontiersin.org/articles/10.3389/fpsyg.2021.662222/full |
DOI | 10.3389/fpsyg.2021.662222 |
Talk, keynote
Explaining News Spreading Phenomena in Social Networks
In Händlerlogo BI Norwegian Business School, 2021.Status: Published
Explaining News Spreading Phenomena in Social Networks
Digital wildfires are fast spreading online misinformation phenomena with the potential to cause harm in the physical world. They have been identified as a considerable risk to developed societies which raised the need to better understand online misinformation phenomena to mitigate that risk. We approach the problem from an interdisciplinary angle with the aim of using large scale analysis of social network data to test hypotheses about the behavior of social network users interacting with misinformation. We discuss state of the art techniques for capturing large volumes of communication data from social networks such as Twitter as well as collections of news such as GDELT. Based on that we describe new methods on how the reach as well as the typical target audience of media and social network participants can be measured. Doing so allows the testing of hypotheses such as the existence of filter bubbles through the use of large amounts of real-world data. Finally we discuss how the detection of anomalies in the typical news spreading patterns can be used to detect disinformation campaigns and digital wildfires.
Afilliation | Communication Systems |
Project(s) | Enabling Graph Neural Networks at Exascale, UMOD: Understanding and Monitoring Digital Wildfires |
Publication Type | Talk, keynote |
Year of Publication | 2021 |
Location of Talk | Händlerlogo BI Norwegian Business School |
Talks, contributed
Will technical means help in preventing digital wildfires?
In 6th World Conference on Media and Mass Communication, Cagliari, Italy, 2021.Status: Published
Will technical means help in preventing digital wildfires?
Afilliation | Scientific Computing |
Project(s) | UMOD: Understanding and Monitoring Digital Wildfires |
Publication Type | Talks, contributed |
Year of Publication | 2021 |
Location of Talk | 6th World Conference on Media and Mass Communication, Cagliari, Italy |
How adolescents and senior citizens evaluate fake news
In 8th biennial European Communication Conference, Braga, Portugal, 2021.Status: Published
How adolescents and senior citizens evaluate fake news
Afilliation | Scientific Computing |
Project(s) | UMOD: Understanding and Monitoring Digital Wildfires |
Publication Type | Talks, contributed |
Year of Publication | 2021 |
Location of Talk | 8th biennial European Communication Conference, Braga, Portugal |
Publications
Journal Article
Enabling unstructured-mesh computation on massively tiled AI processors: An example of accelerating in silico cardiac simulation
Frontiers in Physics 11 (2023).Status: Published
Enabling unstructured-mesh computation on massively tiled AI processors: An example of accelerating in silico cardiac simulation
A new trend in processor architecture design is the packaging of thousands of small processor cores into a single device, where there is no device-level shared memory but each core has its own local memory. Thus, both the work and data of an application code need to be carefully distributed among the small cores, also termed as tiles. In this paper, we investigate how numerical computations that involve unstructured meshes can be efficiently parallelized and executed on a massively tiled architecture. Graphcore IPUs are chosen as the target hardware platform, to which we port an existing monodomain solver that simulates cardiac electrophysiology over realistic 3D irregular heart geometries. There are two computational kernels in this simulator, where a 3D diffusion equation is discretized over an unstructured mesh and numerically approximated by repeatedly executing sparse matrix-vector multiplications (SpMVs), whereas an individual system of ordinary differential equations (ODEs) is explicitly integrated per mesh cell. We demonstrate how a new style of programming that uses Poplar/C++ can be used to port these commonly encountered computational tasks to Graphcore IPUs. In particular, we describe a per-tile data structure that is adapted to facilitate the inter-tile data exchange needed for parallelizing the SpMVs. We also study the achievable performance of the ODE solver that heavily depends on special mathematical functions, as well as their accuracy on Graphcore IPUs. Moreover, topics related to using multiple IPUs and performance analysis are addressed. In addition to demonstrating an impressive level of performance that can be achieved by IPUs for monodomain simulation, we also provide a discussion on the generic theme of parallelizing and executing unstructured-mesh multiphysics computations on massively tiled hardware.
Afilliation | Scientific Computing |
Project(s) | Department of High Performance Computing , SparCity: An Optimization and Co-design Framework for Sparse Computation |
Publication Type | Journal Article |
Year of Publication | 2023 |
Journal | Frontiers in Physics |
Volume | 11 |
Date Published | 03/2023 |
Publisher | Frontiers |
ISSN | 2296-424X |
Keywords | hardware accelerator, heterogenous computing, irregular meshes, scientific computation, scientific computation on MIMD processors, sparse matrix-vector multiplication (SpMV) |
URL | https://www.frontiersin.org/articles/10.3389/fphy.2023.979699/full |
DOI | 10.3389/fphy.2023.979699 |
Proceedings, refereed
A Streaming System for Large-scale Temporal Graph Mining of Reddit Data
In 2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). Lyon, France: IEEE, 2022.Status: Published
A Streaming System for Large-scale Temporal Graph Mining of Reddit Data
Afilliation | Scientific Computing, Machine Learning |
Project(s) | Department of High Performance Computing , Enabling Graph Neural Networks at Exascale |
Publication Type | Proceedings, refereed |
Year of Publication | 2022 |
Conference Name | 2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) |
Pagination | 1153-1162 |
Publisher | IEEE |
Place Published | Lyon, France |
URL | https://ieeexplore.ieee.org/document/9835250/http://xplorestaging.ieee.o... |
DOI | 10.1109/IPDPSW55747.2022.00189 |
Efficient Minimum Weight Vertex Cover Heuristics Using Graph Neural Networks
In 20th International Symposium on Experimental Algorithms (SEA 2022). Vol. 233. Dagstuhl, Germany: Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2022.Status: Published
Efficient Minimum Weight Vertex Cover Heuristics Using Graph Neural Networks
Afilliation | Scientific Computing |
Project(s) | Department of High Performance Computing |
Publication Type | Proceedings, refereed |
Year of Publication | 2022 |
Conference Name | 20th International Symposium on Experimental Algorithms (SEA 2022) |
Volume | 233 |
Pagination | 12:1–12:17 |
Publisher | Schloss Dagstuhl – Leibniz-Zentrum für Informatik |
Place Published | Dagstuhl, Germany |
ISBN Number | 978-3-95977-251-8 |
ISSN Number | 1868-8969 |
URL | https://drops.dagstuhl.de/opus/volltexte/2022/16546 |
DOI | 10.4230/LIPIcs.SEA.2022.12 |
Implementing Spatio-Temporal Graph Convolutional Networks on Graphcore IPUs
In 2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). Lyon, France: IEEE, 2022.Status: Published
Implementing Spatio-Temporal Graph Convolutional Networks on Graphcore IPUs
Afilliation | Machine Learning |
Project(s) | Department of High Performance Computing |
Publication Type | Proceedings, refereed |
Year of Publication | 2022 |
Conference Name | 2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) |
Pagination | 45-54 |
Publisher | IEEE |
Place Published | Lyon, France |
URL | https://ieeexplore.ieee.org/document/9835385/http://xplorestaging.ieee.o... |
DOI | 10.1109/IPDPSW55747.2022.00016 |
Journal Article
COVID-19 and 5G conspiracy theories: Long term observation of a digital wildfire
International Journal of Data Science and Analytics (2022).Status: Published
COVID-19 and 5G conspiracy theories: Long term observation of a digital wildfire
Afilliation | Scientific Computing |
Project(s) | UMOD: Understanding and Monitoring Digital Wildfires |
Publication Type | Journal Article |
Year of Publication | 2022 |
Journal | International Journal of Data Science and Analytics |
Publisher | Springer |
Impacts of Covid-19 on Norwegian salmon exports: A firm-level analysis
Aquaculture 561 (2022): 738678.Status: Published
Impacts of Covid-19 on Norwegian salmon exports: A firm-level analysis
Afilliation | Scientific Computing, Machine Learning |
Project(s) | Department of High Performance Computing |
Publication Type | Journal Article |
Year of Publication | 2022 |
Journal | Aquaculture |
Volume | 561 |
Pagination | 738678 |
Date Published | Jan-12-2022 |
Publisher | Elsevier |
ISSN | 00448486 |
URL | https://www.sciencedirect.com/science/article/pii/S0044848622007955 |
DOI | 10.1016/j.aquaculture.2022.738678 |
The connectivity network underlying the German’s Twittersphere: a testbed for investigating information spreading phenomena
Scientific Reports 12, no. 1 (2022).Status: Published
The connectivity network underlying the German’s Twittersphere: a testbed for investigating information spreading phenomena
Online social networks are ubiquitous, have billions of users, and produce large amounts of data. While platforms like Reddit are based on a forum-like organization where users gather around topics, Facebook and Twitter implement a concept in which individuals represent the primary entity of interest. This makes them natural testbeds for exploring individual behavior in large social networks. Underlying these individual-based platforms is a network whose “friend” or “follower” edges are of binary nature only and therefore do not necessarily reflect the level of acquaintance between pairs of users. In this paper,we present the network of acquaintance “strengths” underlying the German Twittersphere. To that end, we make use of the full non-verbal information contained in tweet–retweet actions to uncover the graph of social acquaintances among users, beyond pure binary edges. The social connectivity between pairs of users is weighted by keeping track of the frequency of shared content and the time elapsed between publication and sharing. Moreover, we also present a preliminary topological analysis of the German Twitter network. Finally, making the data describing the weighted German Twitter network of acquaintances, we discuss how to apply this framework as a ground basis for investigating spreading phenomena of particular contents.
Afilliation | Communication Systems |
Project(s) | UMOD: Understanding and Monitoring Digital Wildfires, Enabling Graph Neural Networks at Exascale |
Publication Type | Journal Article |
Year of Publication | 2022 |
Journal | Scientific Reports |
Volume | 12 |
Issue | 1 |
Date Published | Jan-12-2022 |
Publisher | Nature Publishing Group |
URL | https://www.nature.com/articles/s41598-022-07961-3 |
DOI | 10.1038/s41598-022-07961-3 |
Talks, invited
Efficient Minimum Weight Vertex Cover Heuristics using Graph Neural Networks
In University of Vienna, Austria, 2022.Status: Published
Efficient Minimum Weight Vertex Cover Heuristics using Graph Neural Networks
Minimum weighted vertex cover is the NP-hard graph problem of choosing a subset of vertices incident to all edges such that the sum of the weights of the chosen vertices is minimum. Previous efforts for solving this in practice have typically been based on search-based iterative heuristics or exact algorithms that rely on reduction rules and branching techniques. Although exact methods have shown success in solving instances with up to millions of vertices efficiently, they are limited in practice due to the NP-hardness of the problem. We present a new hybrid method that combines elements from exact methods, iterative search, and graph neural networks (GNNs). More specifically, we first compute a greedy solution using reduction rules whenever possible. If no such rule applies, we consult a GNN model that selects a vertex that is likely to be in or out of the solution, potentially opening up for further reductions. Finally, we use an improved local search strategy to enhance the solution further. Extensive experiments on graphs of up to a billion edges show that the proposed GNN-based approach finds better solutions than existing heuristics. Compared to exact solvers, the method produced solutions that are, on average, 0.04% away from the optimum while taking less time than all state-of-the-art alternatives.
Afilliation | Scientific Computing |
Project(s) | Department of High Performance Computing |
Publication Type | Talks, invited |
Year of Publication | 2022 |
Location of Talk | University of Vienna, Austria |
ML Accelerator Hardware: A Model for Parallel Sparse Computations?
In University of Vienna, Austria, 2022.Status: Published
ML Accelerator Hardware: A Model for Parallel Sparse Computations?
Recently, dedicated accelerator hardware for machine learning applications such as the Graphcore IPUs and Cerebras WSE have evolved from the experimental state into market-ready products, and they have the potential to constitute the next major architectural shift after GPUs saw widespread adoption a decade ago. In this talk we will present the new hardware along with implementations of basic graph and matrix algorithms and show some early results on the attainable performance, as well as the difficulties of establishing fair comparisons to other architectures. We follow up by discussing the wider implications of the architecture for algorithm design and programming, along with the wider implications of adopting such hardware.
Afilliation | Scientific Computing |
Project(s) | Department of High Performance Computing |
Publication Type | Talks, invited |
Year of Publication | 2022 |
Location of Talk | University of Vienna, Austria |
ML Accelerator Hardware: A Model for Parallel Sparse Computations?
In Siam ACDA, Aussois, France. Aussois: SIAM, 2022.Status: Published
ML Accelerator Hardware: A Model for Parallel Sparse Computations?
Recently, dedicated accelerator hardware for machine learning applications such as the Graphcore IPUs and Cerebras WSE have evolved from the experimental state into market-ready products, and they have the potential to constitute the next major architectural shift after GPUs saw widespread adoption a decade ago.
In this talk we will present the new hardware along with implementations of basic graph and matrix algorithms and show some early results on the attainable performance, as well as the difficulties of establishing fair comparisons to other architectures. We follow up by discussing the wider implications of the architecture for algorithm design and programming , along with the wider implications of adopting such hardware.
Afilliation | Scientific Computing |
Project(s) | Department of High Performance Computing |
Publication Type | Talks, invited |
Year of Publication | 2022 |
Location of Talk | Siam ACDA, Aussois, France |
Publisher | SIAM |
Place Published | Aussois |
Journal Article
An alternative correct answer to the Cognitive Reflection Test
Frontiers in Psychology 12 (2021).Status: Published
An alternative correct answer to the Cognitive Reflection Test
Afilliation | Scientific Computing |
Project(s) | UMOD: Understanding and Monitoring Digital Wildfires |
Publication Type | Journal Article |
Year of Publication | 2021 |
Journal | Frontiers in Psychology |
Volume | 12 |
Publisher | Frontiers Media SA |
URL | https://www.frontiersin.org/articles/10.3389/fpsyg.2021.662222/full |
DOI | 10.3389/fpsyg.2021.662222 |
Blinded by emotions: The association between emotional reactivity and trust in fictitious news stories on crime
Studia Psychologica: International Journal for Research and Theory in Psychological Sciences 63, no. 4 (2021): 404-416.Status: Published
Blinded by emotions: The association between emotional reactivity and trust in fictitious news stories on crime
Afilliation | Scientific Computing |
Project(s) | UMOD: Understanding and Monitoring Digital Wildfires |
Publication Type | Journal Article |
Year of Publication | 2021 |
Journal | Studia Psychologica: International Journal for Research and Theory in Psychological Sciences |
Volume | 63 |
Issue | 4 |
Pagination | 404-416 |
Date Published | 12/2021 |
Publisher | Institute of Experimental Psychology, Centre of Social and Psychological Sciences |
DOI | 10.31577/sp.2021.04.833 |
Don't Trust Your Eyes: Image Manipulation in the Age of DeepFakes
Frontiers in Communication 6 (2021).Status: Published
Don't Trust Your Eyes: Image Manipulation in the Age of DeepFakes
Afilliation | Scientific Computing |
Project(s) | UMOD: Understanding and Monitoring Digital Wildfires |
Publication Type | Journal Article |
Year of Publication | 2021 |
Journal | Frontiers in Communication |
Volume | 6 |
Publisher | Frontiers Media SA |
Place Published | Lausanne, Switzerland |
URL | https://www.frontiersin.org/articles/10.3389/fcomm.2021.632317/full |
DOI | 10.3389/fcomm.2021.632317 |
What should I trust? Individual differences in attitudes to conflicting information and misinformation on COVID-19
Frontiers in Psychology 12 (2021).Status: Published
What should I trust? Individual differences in attitudes to conflicting information and misinformation on COVID-19
Afilliation | Scientific Computing |
Project(s) | UMOD: Understanding and Monitoring Digital Wildfires |
Publication Type | Journal Article |
Year of Publication | 2021 |
Journal | Frontiers in Psychology |
Volume | 12 |
Publisher | Frontiers Media SA |
URL | https://www.frontiersin.org/articles/10.3389/fpsyg.2021.588478/full |
DOI | 10.3389/fpsyg.2021.588478 |
Poster
Automated Code Generation for GPU-Based Finite Element Computations in FEniCS
SIAM Conference on Computational Science and Engineering (CSE21): SIAM, 2021.Status: Published
Automated Code Generation for GPU-Based Finite Element Computations in FEniCS
Developing high-performance finite element codes normally requires hand-crafting and fine tuning of computational kernels, which is not an easy task to carry out for each and every problem. Automated code generation has proved to be a highly productive alternative for frameworks like FEniCS, where a compiler is used to automatically generate suitable kernels from high-level mathematical descriptions of finite element problems. This strategy has so far enabled users to develop and run a variety of high-performance finite element solvers on clusters of multicore CPUs. We have recently enhanced FEniCS with GPU acceleration by enabling its internal compiler to generate CUDA kernels that are needed to offload finite element calculations to GPUs, particularly the assembly of linear systems. This poster presents the results of GPU-accelerating FEniCS and explores performance characteristics of auto-generated CUDA kernels and GPU-based assembly of linear systems for finite element methods.
Afilliation | Scientific Computing |
Project(s) | Meeting Exascale Computing with Source-to-Source Compilers |
Publication Type | Poster |
Year of Publication | 2021 |
Date Published | 03/2021 |
Publisher | SIAM |
Place Published | SIAM Conference on Computational Science and Engineering (CSE21) |
Talk, keynote
Explaining News Spreading Phenomena in Social Networks
In Händlerlogo BI Norwegian Business School, 2021.Status: Published
Explaining News Spreading Phenomena in Social Networks
Digital wildfires are fast spreading online misinformation phenomena with the potential to cause harm in the physical world. They have been identified as a considerable risk to developed societies which raised the need to better understand online misinformation phenomena to mitigate that risk. We approach the problem from an interdisciplinary angle with the aim of using large scale analysis of social network data to test hypotheses about the behavior of social network users interacting with misinformation. We discuss state of the art techniques for capturing large volumes of communication data from social networks such as Twitter as well as collections of news such as GDELT. Based on that we describe new methods on how the reach as well as the typical target audience of media and social network participants can be measured. Doing so allows the testing of hypotheses such as the existence of filter bubbles through the use of large amounts of real-world data. Finally we discuss how the detection of anomalies in the typical news spreading patterns can be used to detect disinformation campaigns and digital wildfires.
Afilliation | Communication Systems |
Project(s) | Enabling Graph Neural Networks at Exascale, UMOD: Understanding and Monitoring Digital Wildfires |
Publication Type | Talk, keynote |
Year of Publication | 2021 |
Location of Talk | Händlerlogo BI Norwegian Business School |
Proceedings, refereed
Explaining the Performance of Supervised and Semi-Supervised Methods for Automated Sparse Matrix Format Selection
In 50th International Conference on Parallel Processing Workshop. Chicago, Illinois, USA: ACM, 2021.Status: Published
Explaining the Performance of Supervised and Semi-Supervised Methods for Automated Sparse Matrix Format Selection
Afilliation | Scientific Computing, Machine Learning |
Project(s) | Department of High Performance Computing , SparCity: An Optimization and Co-design Framework for Sparse Computation |
Publication Type | Proceedings, refereed |
Year of Publication | 2021 |
Conference Name | 50th International Conference on Parallel Processing Workshop |
Pagination | 1-10 |
Date Published | 08/2021 |
Publisher | ACM |
Place Published | Chicago, Illinois, USA |
Incremental Clustering Algorithms for Massive Dynamic Graphs
In International Conference on Data Mining Workshops (ICDMW). Auckland, New Zealand : IEEE, 2021.Status: Published
Incremental Clustering Algorithms for Massive Dynamic Graphs
We consider the problem of incremental graph clustering where the graph to be clustered is given as a sequence of disjoint subsets of the edge set. The problem appears when dealing with graphs that are created over time, such as online social networks where new users appear continuously, or protein interaction networks when new proteins are discovered. For very large graphs, it is computationally too expensive to repeatedly apply standard clustering algorithms. Instead, algorithms whose time complexity only depends on the size of the incoming subset of edges in every step are needed. At the same time, such algorithms should find clusterings whose quality is close to that produced by offline algorithms. In this paper, we discuss the computational model and present an incremental clustering algorithm. We test the algorithm performance and quality on a wide variety of instances. Our results show that the algorithm far outperforms offline algorithms while retaining a large fraction of their clustering quality.
Afilliation | Scientific Computing, Machine Learning |
Project(s) | Department of High Performance Computing , Enabling Graph Neural Networks at Exascale |
Publication Type | Proceedings, refereed |
Year of Publication | 2021 |
Conference Name | International Conference on Data Mining Workshops (ICDMW) |
Pagination | 360-369 |
Date Published | 12/2021 |
Publisher | IEEE |
Place Published | Auckland, New Zealand |
ISBN Number | 978-1-6654-2427-1 |
ISSN Number | 2375-9259 |
URL | https://ieeexplore.ieee.org/abstract/document/9679843 |
DOI | 10.1109/ICDMW53433.2021.00051 |
iPUG for multiple Graphcore IPUs: Optimizing performance and scalability of parallel breadth-first search
In 28th IEEE International Conference on High Performance Computing, Data, & Analytics (HiPC). Bangalore, India: IEEE, 2021.Status: Published
iPUG for multiple Graphcore IPUs: Optimizing performance and scalability of parallel breadth-first search
Parallel graph algorithms have become one of the principal applications of high-performance computing besides numerical simulations and machine learning workloads. However, due to their highly unstructured nature, graph algorithms remain extremely challenging for most parallel systems, with large gaps between observed performance and theoretical limits. Further-more, most mainstream architectures rely heavily on single instruction multiple data (SIMD) processing for high floating-point rates, which is not beneficial for graph processing which instead requires high memory bandwidth, low memory latency, and efficient processing of unstructured data. On the other hand, we are currently observing an explosion of new hardware architectures, many of which are adapted to specific purposes and diverge from traditional designs. A notable example is the Graphcore Intelligence Processing Unit (IPU), which is developed to meet the needs of upcoming machine intelligence applications. Its design eschews the traditional cache hierarchy, relying on SRAM as its main memory instead. The result is an extremely high-bandwidth, low-latency memory at the cost of capacity. In addition, the IPU consists of a large number of independent cores, allowing for true multiple instruction multiple data (MIMD) processing. Together, these features suggest that such a processor is well suited for graph processing. We test the limits of graph processing on multiple IPUs by implementing a low-level, high-performance code for breadth-first search (BFS), following the specifications of Graph500, the most widely used benchmark for parallel graph processing. Despite the simplicity of the BFS algorithm, implementing efficient parallel codes for it has proven to be a challenging task in the past. We show that our implementation scales well on a system with 8 IPUs and attains roughly twice the performance of an equal number of NVIDIA V100 GPUs using state-of-the-art CUDA code.
Afilliation | Scientific Computing |
Project(s) | Department of High Performance Computing , SparCity: An Optimization and Co-design Framework for Sparse Computation |
Publication Type | Proceedings, refereed |
Year of Publication | 2021 |
Conference Name | 28th IEEE International Conference on High Performance Computing, Data, & Analytics (HiPC) |
Pagination | 162-171 |
Date Published | 12/2021 |
Publisher | IEEE |
Place Published | Bangalore, India |
DOI | 10.1109/HiPC53243.2021.00030 |
iPUG: Accelerating Breadth-First Graph Traversals Using Manycore Graphcore IPUs
In High Performance Computing. ISC High Performance 2021. Vol. LNCS, volume 12728. Cham: Springer International Publishing, 2021.Status: Published
iPUG: Accelerating Breadth-First Graph Traversals Using Manycore Graphcore IPUs
The Graphcore Intelligence Processing Unit (IPU) is a newly developed processor type whose architecture does not rely on the traditional caching hierarchies. Developed to meet the need for more and more data-centric applications, such as machine learning, IPUs combine a dedicated portion of SRAM with each of its numerous cores, resulting in high memory bandwidth at the price of capacity. The proximity of processor cores and memory makes the IPU a promising field of experimentation for graph algorithms since it is the unpredictable, irregular memory accesses that lead to performance losses in traditional processors with pre-caching.
This paper aims to test the IPU’s suitability for algorithms with hard-to-predict memory accesses by implementing a breadth-first search (BFS) that complies with the Graph500 specifications. Precisely because of its apparent simplicity, BFS is an established benchmark that is not only subroutine for a variety of more complex graph algorithms, but also allows comparability across a wide range of architectures.
We benchmark our IPU code on a wide range of instances and compare its performance to state-of-the-art CPU and GPU codes. The results indicate that the IPU delivers speedups of up to 4×4× over the fastest competing result on an NVIDIA V100 GPU, with typical speedups of about 1.5×1.5× on most test instances.
Afilliation | Scientific Computing |
Project(s) | Department of High Performance Computing |
Publication Type | Proceedings, refereed |
Year of Publication | 2021 |
Conference Name | High Performance Computing. ISC High Performance 2021 |
Volume | LNCS, volume 12728 |
Pagination | 291-309 |
Publisher | Springer International Publishing |
Place Published | Cham |
ISBN Number | 978-3-030-78712-7 |
ISSN Number | 0302-9743 |
Keywords | BFS, Graph500, IPU, Performance optimization |
URL | https://link.springer.com/10.1007/978-3-030-78713-4 |
DOI | 10.1007/978-3-030-78713-4 |
Shared-memory Implementation of the Karp-Sipser Kernelization Process
In 28th edition of the IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC 2021). Bangalore, India: IEEE, 2021.Status: Published
Shared-memory Implementation of the Karp-Sipser Kernelization Process
Afilliation | Scientific Computing |
Project(s) | Department of High Performance Computing |
Publication Type | Proceedings, refereed |
Year of Publication | 2021 |
Conference Name | 28th edition of the IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC 2021) |
Pagination | 71-80 |
Date Published | 12/2021 |
Publisher | IEEE |
Place Published | Bangalore, India |
WICO Graph: a Labeled Dataset of Twitter Subgraphs based on Conspiracy Theory and 5G-Corona Misinformation Tweets
In Proceedings of the 13th International Conference on Agents and Artificial Intelligence (ICAART 2021). Vol. 2. SCITEPRESS, 2021.Status: Published
WICO Graph: a Labeled Dataset of Twitter Subgraphs based on Conspiracy Theory and 5G-Corona Misinformation Tweets
In the wake of the COVID-19 pandemic, a surge of misinformation has flooded social media and other internet channels, and some of it has the potential to cause real-world harm. To counteract this misinformation, reliably identifying it is a principal problem to be solved. However, the identification of misinformation poses a formidable challenge for language processing systems since the texts containing misinformation are short, work with insinuation rather than explicitly stating a false claim, or resemble other postings that deal with the same topic ironically. Accordingly, for the development of better detection systems, it is not only essential to use hand-labeled ground truth data and extend the analysis with methods beyond Natural Language Processing to consider the characteristics of the participant's relationships and the diffusion of misinformation. This paper presents a novel dataset that deals with a specific piece of misinformation: the idea that the 5G wireless network is causally connected to the COVID-19 pandemic. We have extracted the subgraphs of 3,000 manually classified Tweets from Twitter's follower network and distinguished them into three categories. First, subgraphs of Tweets that propagate the specific 5G misinformation, those that spread other conspiracy theories, and Tweets that do neither. We created the WICO (Wireless Networks and Coronavirus Conspiracy) dataset to support experts in machine learning experts, graph processing, and related fields in studying the spread of misinformation. Furthermore, we provide a series of baseline experiments using both Graph Neural Networks and other established classifiers that use simple graph metrics as features. The dataset is available at https://datasets.simula.no/wico-graph
Afilliation | Scientific Computing |
Project(s) | Department of High Performance Computing |
Publication Type | Proceedings, refereed |
Year of Publication | 2021 |
Conference Name | Proceedings of the 13th International Conference on Agents and Artificial Intelligence (ICAART 2021) |
Volume | 2 |
Pagination | 257-266 |
Publisher | SCITEPRESS |
ISBN Number | 978-989-758-484-8 |
DOI | 10.5220/0010262802570266 |
WICO Text: A Labeled Dataset of Conspiracy Theory and 5G-Corona Misinformation Tweets
In Proceedings of the 2021 Workshop on Open Challenges in Online Social Networks (OASIS '21). ACM, 2021.Status: Published
WICO Text: A Labeled Dataset of Conspiracy Theory and 5G-Corona Misinformation Tweets
Afilliation | Machine Learning |
Project(s) | Department of High Performance Computing , Department of Holistic Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2021 |
Conference Name | Proceedings of the 2021 Workshop on Open Challenges in Online Social Networks (OASIS '21) |
Pagination | 21-25 |
Publisher | ACM |
Talks, contributed
How adolescents and senior citizens evaluate fake news
In 8th biennial European Communication Conference, Braga, Portugal, 2021.Status: Published
How adolescents and senior citizens evaluate fake news
Afilliation | Scientific Computing |
Project(s) | UMOD: Understanding and Monitoring Digital Wildfires |
Publication Type | Talks, contributed |
Year of Publication | 2021 |
Location of Talk | 8th biennial European Communication Conference, Braga, Portugal |
Motivated reasoning in the evaluation of news quality
In 32nd International Congress of Psychology, Praha, Czech Republic, 2021.Status: Published
Motivated reasoning in the evaluation of news quality
Afilliation | Scientific Computing |
Project(s) | UMOD: Understanding and Monitoring Digital Wildfires |
Publication Type | Talks, contributed |
Year of Publication | 2021 |
Location of Talk | 32nd International Congress of Psychology, Praha, Czech Republic |
Shared-Memory Implementation of the Karp- Sipser Kernelization Process
In SIAM ACDA, Richland (virtual), 2021.Status: Published
Shared-Memory Implementation of the Karp- Sipser Kernelization Process
We investigate the parallelization of the Karp-Sipser ker- nelization technique, which constitutes the central part of the well known Karp-Sipser heuristic for the maximum cardinality matching problem. The technique reduces a given problem instance to a smaller but equivalent one, by a series of two operations: vertex removal, and merg- ing/unifying two vertices. The operation of merging two vertices poses the principal challenge in parallelizing the technique. We describe an algorithm that minimizes the need for synchronization and present an efficient shared- memory parallel implementation. Using extensive experi- ments on a variety of multicore CPUs, we show that our implementation scales well up to 32 cores on one socket.
Afilliation | Scientific Computing |
Project(s) | Department of High Performance Computing |
Publication Type | Talks, contributed |
Year of Publication | 2021 |
Location of Talk | SIAM ACDA, Richland (virtual) |
Sociodemographic attributes, media consumption and susceptibility to fake news
In 6th World Conference on Media and Mass Communication, Cagliari, Italy, 2021.Status: Published
Sociodemographic attributes, media consumption and susceptibility to fake news
Afilliation | Scientific Computing |
Project(s) | UMOD: Understanding and Monitoring Digital Wildfires |
Publication Type | Talks, contributed |
Year of Publication | 2021 |
Location of Talk | 6th World Conference on Media and Mass Communication, Cagliari, Italy |
Will technical means help in preventing digital wildfires?
In 6th World Conference on Media and Mass Communication, Cagliari, Italy, 2021.Status: Published
Will technical means help in preventing digital wildfires?
Afilliation | Scientific Computing |
Project(s) | UMOD: Understanding and Monitoring Digital Wildfires |
Publication Type | Talks, contributed |
Year of Publication | 2021 |
Location of Talk | 6th World Conference on Media and Mass Communication, Cagliari, Italy |
Talks, invited
Spreading Online Misinformation
In Data-SKUP 2021, 2021.Status: Published
Spreading Online Misinformation
Digital wildfires are fast spreading online misinformation phenomena with the potential to cause harm in the physical world. They have been identified as a considerable risk to developed societies which raised the need to better understand online misinformation phenomena to mitigate that risk. We approach the problem from an interdisciplinary angle with the aim of using large scale analysis of social network data to test hypotheses about the behavior of social network users interacting with misinformation. We discuss state of the art techniques for capturing large volumes of communication data from social networks such as Twitter as well as collections of news such as GDELT. Based on that we describe new methods on how the reach as well as the typical target audience of media and social network participants can be measured. Doing so allows the testing of hypotheses such as the existence of filter bubbles through the use of large amounts of real-world data. Finally we discuss how the detection of anomalies in the typical news spreading patterns can be used to detect disinformation campaigns and digital wildfires.
Afilliation | Scientific Computing |
Project(s) | Department of High Performance Computing , Simula Metropolitan Center for Digital Engineering |
Publication Type | Talks, invited |
Year of Publication | 2021 |
Location of Talk | Data-SKUP 2021 |
Proceedings, refereed
A Framework for Interaction-based Propagation Analysis in Online Social Networks
In Complex Networks. Springer, 2020.Status: Accepted
A Framework for Interaction-based Propagation Analysis in Online Social Networks
Online social networks create a digital footprint of human interaction naturally by the way they function. Thus, they allow a large-scale analysis of human behavior which was previously infeasible for social scientists. Consequently, social networks have been studied intensely in the last decade. The core of most social networks is the relationship between users which can be described as a graph. The graph can be either undirected, as is the case for the friendship relation of Facebook, or directed, which is the case of the follower relation on Twitter. The relationship is readily visible, e.g. on the user interface of the social networks themselves. However, these edges are unweighted expressions of interest and reflect how individuals have chosen to relate to each other rather than how they actually interact with each other. For studying information propagation, comparing interaction properties is crucial and, therefore, using models based on connections that reflect different dimensions and strengths of acquaintance seems appropriate. Thus, there is a need for obtaining weighted edges from the communication that occurs on the social network. In this paper, we present a novel method to calculate an acquaintance score between pairs of Twitter users and use the resulting networks to enable the analysis of interaction based information propagation. By understanding the frequency and velocity with which individuals share content as a measure of acquaintance, it becomes possible to predict, compare communication patterns, and detect unusual communication. In contrast to previous work which assigns edge weights based on tie strength, our score considers the response time as a crucial factor and, therefore, enables time-based spreading comparisons.
Afilliation | Scientific Computing |
Project(s) | UMOD: Understanding and Monitoring Digital Wildfires, Department of High Performance Computing |
Publication Type | Proceedings, refereed |
Year of Publication | 2020 |
Conference Name | Complex Networks |
Publisher | Springer |
Notes | Extended abstract |
A Scalable System for Bundling Online Social Network Mining Research
In 2020 Seventh International Conference on Social Networks Analysis, Management and Security (SNAMS). IEEE, 2020.Status: Published
A Scalable System for Bundling Online Social Network Mining Research
Online social networks such as Facebook and Twitter are part of the everyday life of millions of people. They are not only used for interaction but play an essential role when it comes to information acquisition and knowledge gain. The abundance and detail of the accumulated data in these online social networks open up new possibilities for social researchers and psychologists, allowing them to study behavior in a large test population. However, complex application programming interfaces (API) and data scraping restrictions are, in many cases, a limiting factor when accessing this data. Furthermore, research projects are typically granted restricted access based on quotas. Thus, research tools such as scrapers that access social network data through an API must manage these quotas. While this is generally feasible, it becomes a problem when more than one tool, or multiple instances of the same tool, is being used in the same research group. Since different tools typically cannot balance access to a shared quota on their own, additional software is needed to prevent the individual tools from overusing the shared quota. In this paper, we present a proxy server that manages several researchers' data contingents in a cooperative research environment and thus enables a transparent view of a subset of Twitter's API. Our proxy scales linearly with the number of clients in use and incurs almost no performance penalties or implementation overhead to further layer or applications that need to work with the Twitter API. Thus, it allows seamless integration of multiple API accessing programs within the same research group.
Afilliation | Scientific Computing |
Project(s) | UMOD: Understanding and Monitoring Digital Wildfires, Department of High Performance Computing |
Publication Type | Proceedings, refereed |
Year of Publication | 2020 |
Conference Name | 2020 Seventh International Conference on Social Networks Analysis, Management and Security (SNAMS) |
Pagination | 1-6 |
Publisher | IEEE |
A System for High Performance Mining on GDELT Data
In 2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). IEEE, 2020.Status: Published
A System for High Performance Mining on GDELT Data
We design a system for efficient in-memory analysis of data from the GDELT database of news events. The specialization of the system allows us to avoid the inefficiencies of existing alternatives, and make full use of modern parallel high-performance computing hardware. We then present a series of experiments showcasing the system’s ability to analyze correlations in the entire GDELT 2.0 database containing more than a billion news items. The results reveal large scale trends in the world of today’s online news.
Afilliation | Scientific Computing |
Project(s) | UMOD: Understanding and Monitoring Digital Wildfires |
Publication Type | Proceedings, refereed |
Year of Publication | 2020 |
Conference Name | 2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) |
Date Published | 05/2020 |
Publisher | IEEE |
Keywords | Data mining, GDELT, High Performance Computing, Misinformation, Publishing |
Evaluating Standard Classifiers for Detecting COVID-19 related Misinformation
In MediaEval 2020. CEUR, 2020.Status: Published
Evaluating Standard Classifiers for Detecting COVID-19 related Misinformation
Afilliation | Machine Learning |
Project(s) | UMOD: Understanding and Monitoring Digital Wildfires, Department of High Performance Computing , Department of Holistic Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2020 |
Conference Name | MediaEval 2020 |
Publisher | CEUR |
FakeNews: Corona Virus and 5G Conspiracy Task at MediaEval 2020
In Media Eval Challange 2020. CEUR, 2020.Status: Published
FakeNews: Corona Virus and 5G Conspiracy Task at MediaEval 2020
Afilliation | Scientific Computing, Machine Learning |
Project(s) | Department of High Performance Computing , Department of Holistic Systems, UMOD: Understanding and Monitoring Digital Wildfires |
Publication Type | Proceedings, refereed |
Year of Publication | 2020 |
Conference Name | Media Eval Challange 2020 |
Publisher | CEUR |
Karp-Sipser based Kernels for Bipartite Graph Matching
In Algorithm Engineering and Experiment (ALENEX). Society for Industrial and Applied Mathematics, 2020.Status: Published
Karp-Sipser based Kernels for Bipartite Graph Matching
We consider Karp–Sipser, a well known matching heuristic in the context of data reduction for the max- imum cardinality matching problem. We describe an efficient implementation as well as modifications to reduce its time complexity in worst case instances, both in theory and in practical cases. We compare experimentally against its widely used simpler variant and show cases for which the full algorithm yields better performance.
Afilliation | Scientific Computing |
Project(s) | Meeting Exascale Computing with Source-to-Source Compilers, UMOD: Understanding and Monitoring Digital Wildfires, Department of High Performance Computing |
Publication Type | Proceedings, refereed |
Year of Publication | 2020 |
Conference Name | Algorithm Engineering and Experiment (ALENEX) |
Pagination | 134-145 |
Publisher | Society for Industrial and Applied Mathematics |
Resource Efficient Algorithms for Message Sampling in Online Social Networks
In The Seventh International Conference on Social Networks Analysis, Management and Security (SNAMS-2020). IEEE, 2020.Status: Published
Resource Efficient Algorithms for Message Sampling in Online Social Networks
Sampling the network structure of online social networks is a widely discussed topic as it enables a wide variety of research in computational social science and associated fields. However, analyzing and sampling contentful messages still lacks effective solutions. Previous work for retrieving messages from social networks either used endpoints that are not available to the general research community or analyzed a predefined stream of messages. Our work uses features of the Twitter API that we utilize to construct a data structure that optimizes the efficiency of requests sent to the social network. Moreover, we present a strategy for selecting users to sample, which improves the effectiveness of our query optimizing data structure by leveraging existing models of user behavior. Combining our data structure with our proposed algorithm, we can achieve a 92% sampling efficiency over long timeframes.
Afilliation | Scientific Computing |
Project(s) | UMOD: Understanding and Monitoring Digital Wildfires, Department of High Performance Computing |
Publication Type | Proceedings, refereed |
Year of Publication | 2020 |
Conference Name | The Seventh International Conference on Social Networks Analysis, Management and Security (SNAMS-2020) |
Publisher | IEEE |
Poster
A Framework for Interaction-based Propagation Analysis in Online Social Networks
Complex Networks, 2020.Status: Published
A Framework for Interaction-based Propagation Analysis in Online Social Networks
Afilliation | Scientific Computing |
Project(s) | Department of High Performance Computing , Department of Holistic Systems, UMOD: Understanding and Monitoring Digital Wildfires |
Publication Type | Poster |
Year of Publication | 2020 |
Place Published | Complex Networks |
Efficient simulations of patient-specific electrical heart activity on the DGX-2
GPU Technology Conference (GTC) 2020, Silicon Valley, USA: Nvidia, 2020.Status: Published
Efficient simulations of patient-specific electrical heart activity on the DGX-2
Patients who have suffered a heart attack have an elevated risk of developing arrhythmia. The use of computer simulations of the electrical activity in the hearts of these patients, is emerging as an alternative to traditional, more invasive examinations performed by doctors today. Recent advances in personalised arrhythmia risk prediction show that computational models can provide not only safer but also more accurate results than invasive procedures. However, biophysically accurate simulations of the electrical activity in the heart require solving linear systems over fine meshes and time resolutions, which can take hours or even days. This limits the use of such simulations in the clinic where diagnosis and treatment planning can be time sensitive, even if it is just for the reason of operation schedules. Furthermore, the non-interactive, non-intuitive way of accessing simulations and their results makes it hard to study these collaboratively. Overcoming these limitations requires speeding up computations from hours to seconds, which requires a massive increase in computational capabilities.
We have developed a code that is capable of performing highly efficient heart simulations on the DGX-2, making use of all 16 V100 GPUs. Using a patient-specific unstructured tetrahedral mesh with 11.7 million cells, we are able to simulate the electrical heart activity at 1/30 of real-time. Moreover, we are able to show that the throughput achieved using all 16 GPUs in the DGX-2 is 77.6% of the theoretical maximum.
We achieved this through extensive optimisations of the two kernels constituting the body of the main loop in the simulator. In the kernel solving the diffusion equation (governing the spread of the electrical signal), constituting of a sparse matrix-vector multiplication, we minimise the memory traffic by reordering the mesh (and matrix) elements into clusters that fit in the V100's L2 cache. In the kernel solving the cell model (describing the complex interactions of ion channels in the cell membrane), we apply sophisticated domain-specific optimisations to reduce the number of floating point operations to the point where the kernel becomes memory bound. After optimisation, both kernels are memory bound, and we derive the minimum memory traffic, which we then divide by the aggregate memory bandwidth to obtain a lower bound on the execution time.
Topics discussed include optimisations for sparse matrix-vector multiplications, strategies for handling inter-device communication for unstructured meshes, and lessons we learnt while programming the DGX-2.
Afilliation | Scientific Computing |
Project(s) | Department of Computational Physiology, Department of High Performance Computing |
Publication Type | Poster |
Year of Publication | 2020 |
Date Published | 03/2020 |
Publisher | Nvidia |
Place Published | GPU Technology Conference (GTC) 2020, Silicon Valley, USA |
Graph Structure Based Monitoring of Digital Wildfires
6th International Conference on Computational Social Science, Boston, MA, USA, 2020.Status: Published
Graph Structure Based Monitoring of Digital Wildfires
Afilliation | Scientific Computing |
Project(s) | UMOD: Understanding and Monitoring Digital Wildfires |
Publication Type | Poster |
Year of Publication | 2020 |
Date Published | 07/2020 |
Place Published | 6th International Conference on Computational Social Science, Boston, MA, USA |
Type of Work | Poster |
Keywords | Digital wildfires, Graph neural networks, Large scale infrastructure, Misinformation, Social network analysis |
URL | http://2020.ic2s2.org/program |
Towards detailed Organ-Scale Simulations in Cardiac Electrophysiology
GPU Technology Conference (GTC), Silicon Valley, San Jose, USA, 2020.Status: Published
Towards detailed Organ-Scale Simulations in Cardiac Electrophysiology
Afilliation | Scientific Computing |
Project(s) | Meeting Exascale Computing with Source-to-Source Compilers, Department of High Performance Computing |
Publication Type | Poster |
Year of Publication | 2020 |
Place Published | GPU Technology Conference (GTC), Silicon Valley, San Jose, USA |
Type of Work | Poster |
Journal Article
Cache simulation for irregular memory traffic on multi-core CPUs: Case study on performance models for sparse matrix–vector multiplication
Journal of Parallel and Distributed Computing 144 (2020): 189-205.Status: Published
Cache simulation for irregular memory traffic on multi-core CPUs: Case study on performance models for sparse matrix–vector multiplication
Parallel computations with irregular memory access patterns are often limited by the memory subsystems of multi-core CPUs, though it can be difficult to pinpoint and quantify performance bottlenecks precisely. We present a method for estimating volumes of data traffic caused by irregular, parallel computations on multi-core CPUs with memory hierarchies containing both private and shared caches. Further, we describe a performance model based on these estimates that applies to bandwidth-limited computations. As a case study, we consider two standard algorithms for sparse matrix–vector multiplication, a widely used, irregular kernel. Using three different multi-core CPU systems and a set of matrices that induce a range of irregular memory access patterns, we demonstrate that our cache simulation combined with the proposed performance model accurately quantifies performance bottlenecks that would not be detected using standard best- or worst-case estimates of the data traffic volume.
Afilliation | Scientific Computing |
Project(s) | Meeting Exascale Computing with Source-to-Source Compilers, Department of High Performance Computing |
Publication Type | Journal Article |
Year of Publication | 2020 |
Journal | Journal of Parallel and Distributed Computing |
Volume | 144 |
Pagination | 189--205 |
Date Published | 06/2020 |
Publisher | Elsevier |
ISSN | 0743-7315 |
Keywords | AMD Epyc, Cache simulation, Intel Xeon, Performance model, Sparse matrix–vector multiplication |
URL | http://www.sciencedirect.com/science/article/pii/S0743731520302999 |
DOI | 10.1016/j.jpdc.2020.05.020 |
Truth is in the eye of the beholder: Individual differences in the evaluation of fake and real news
Applied Cognitive Psychology (2020).Status: Submitted
Truth is in the eye of the beholder: Individual differences in the evaluation of fake and real news
Afilliation | Scientific Computing |
Project(s) | UMOD: Understanding and Monitoring Digital Wildfires |
Publication Type | Journal Article |
Year of Publication | 2020 |
Journal | Applied Cognitive Psychology |
Publisher | Wiley |
Talks, contributed
Factors associated with trust in fake news among older adults
In 19th General Meeting of the European Association of Social Psychology, Kraków, Poland, 2020.Status: Accepted
Factors associated with trust in fake news among older adults
Afilliation | Scientific Computing |
Project(s) | UMOD: Understanding and Monitoring Digital Wildfires |
Publication Type | Talks, contributed |
Year of Publication | 2020 |
Location of Talk | 19th General Meeting of the European Association of Social Psychology, Kraków, Poland |
Fake news recognition in young and older adults
In 62nd Conference of Experimental Psychologists (TeaP), Jena, Germany, 2020.Status: Accepted
Fake news recognition in young and older adults
Afilliation | Scientific Computing |
Project(s) | UMOD: Understanding and Monitoring Digital Wildfires |
Publication Type | Talks, contributed |
Year of Publication | 2020 |
Location of Talk | 62nd Conference of Experimental Psychologists (TeaP), Jena, Germany |
Talks, invited
Fake News, Networks, and Natural Language Processing Tools
In Oslo (virtual), 2020.Status: Published
Fake News, Networks, and Natural Language Processing Tools
Afilliation | Scientific Computing |
Project(s) | UMOD: Understanding and Monitoring Digital Wildfires, Department of High Performance Computing |
Publication Type | Talks, invited |
Year of Publication | 2020 |
Location of Talk | Oslo (virtual) |
Talks, invited
Approximate weight perfect matchings for pivoting in parallel sparse linear solvers
In International Congress on Industrial and Applied Mathematics (ICIAM), Valencia, Spain, 2019.Status: Published
Approximate weight perfect matchings for pivoting in parallel sparse linear solvers
The problem of finding good pivots in scalable sparse direct solvers before factorization has posed considerable algorithmic challenges in the past. Currently, sequential implementations of maximum weight perfect matching algorithms such MC64 are used due to the lack of alternatives. To overcome this limitation, we propose a fully parallel distributed memory algorithm and show how to derive a factor 2 approximation guarantee. We also discuss a heuristic version that generates perfect matchings of near-optimum weight.
Afilliation | Scientific Computing |
Project(s) | UMOD: Understanding and Monitoring Digital Wildfires |
Publication Type | Talks, invited |
Year of Publication | 2019 |
Location of Talk | International Congress on Industrial and Applied Mathematics (ICIAM), Valencia, Spain |
Type of Talk | Invited Minisymposium Presentation |
PGAS for graph analytics: can one sided communications break the scalability barrier ?
In Computing Frontiers, Alghero, Italy, 2019.Status: Published
PGAS for graph analytics: can one sided communications break the scalability barrier ?
As the world is becoming increasingly interconnected, systems are becoming increasingly complex. Therefore, technologies that can analyze connected systems and their dynamic characteristics become indispensable. Consequently, the last decade has seen increasing interest in graph analytics, which allows obtaining insights from such connected data. Parallel graph analytics can reveal the workings of intricate systems and networks at massive scales, which are found in diverse areas such as social networks, economic transactions, and protein interactions. While sequential graph algorithms have been studied for decades, the recent availability of massive datasets has given rise to the need for parallel graph processing, which poses unique challenges.
Benchmarks such as the Graph 500 have shown that graph processing performance is largely unrelated to traditional measurements of performance such as FLOPS or memory bandwidth. Instead, algorithmic communication aggregation and network latencies play a crucial role here.
In this talk we introduce the area of parallel graph analytics with a special focus on news dissemination, along with the technical challenges it presents and discuss how PGAS systems with support for one-sided messaging, such as UPC++, can help in overcoming these challenges.
Afilliation | Scientific Computing |
Project(s) | Meeting Exascale Computing with Source-to-Source Compilers |
Publication Type | Talks, invited |
Year of Publication | 2019 |
Location of Talk | Computing Frontiers, Alghero, Italy |
Type of Talk | Invited Session Talk |
Keywords | Convergence, Graph algorithms, PGAS, UPC++ |
URL | http://www.computingfrontiers.org/2019/program.html |
Towards real time simulations for in silico arrhythmia risk prediction
In Annual Meeting of the Scandinavian Physiological Society, Cardiac Physiology Special Interest Group pre-meeting, Reykjavik, Iceland, 2019.Status: Published
Towards real time simulations for in silico arrhythmia risk prediction
Recent advances in personalized arrhythmia risk prediction show that computational models can provide not only safer but also more accurate results than invasive procedures. However, biophysically accurate simulations require solving linear systems over fine meshes and time resolutions, which can take hours or even days. This limits the use of such simulations in the clinic where diagnosis and treatment planning can be time sensitive, even if it is just for the reason of operation schedules. Furthermore, the non-interactive, non-intuitive way of accessing simulations and their results makes it hard to study these collaboratively. Overcoming these limitations requires speeding up computations from hours to seconds, which requires a massive increase in computational capabilities.
In this talk, we present ongoing work on the parallelization of finite volume computations over an unstructured mesh as well as the challenges involved in building scalable simulation codes and discuss the steps needed to close the gap to accurate real-time computations.
Afilliation | Scientific Computing |
Project(s) | Meeting Exascale Computing with Source-to-Source Compilers |
Publication Type | Talks, invited |
Year of Publication | 2019 |
Location of Talk | Annual Meeting of the Scandinavian Physiological Society, Cardiac Physiology Special Interest Group pre-meeting, Reykjavik, Iceland |
Proceedings, refereed
Combining algorithmic rethinking and AVX-512 intrinsics for efficient simulation of subcellular calcium signaling
In International Conference on Computational Science (ICCS 2019). Springer, 2019.Status: Published
Combining algorithmic rethinking and AVX-512 intrinsics for efficient simulation of subcellular calcium signaling
Calcium signaling is vital for the contraction of the heart. Physiologically realistic simulation of this subcellular process requires nanometer resolutions and a complicated mathematical model of differential equations. Since the subcellular space is composed of several irregularly-shaped and intricately-connected physiological domains with distinct properties, one particular challenge is to correctly compute the diffusion-induced calcium fluxes between the physiological domains. The common approach is to pre-calculate the effective diffusion coefficients between all pairs of neighboring computational voxels, and store them in large arrays. Such a strategy avoids complicated if-tests when looping through the computational mesh, but suffers from substantial memory overhead. In this paper, we adopt a memory-efficient strategy that uses a small lookup table of diffusion coefficients. The memory footprint and traffic are both drastically reduced, while also avoiding the if-tests. However, the new strategy induces more instructions on the processor level. To offset this potential performance pitfall, we use AVX-512 intrinsics to effectively vectorize the code. Performance measurements on a Knights Landing processor and a quad-socket Skylake server show a clear performance advantage of the manually vectorized implementation that uses lookup tables, over the counterpart using coefficient arrays.
Afilliation | Scientific Computing |
Project(s) | Meeting Exascale Computing with Source-to-Source Compilers, Department of High Performance Computing |
Publication Type | Proceedings, refereed |
Year of Publication | 2019 |
Conference Name | International Conference on Computational Science (ICCS 2019) |
Pagination | 681-687 |
Publisher | Springer |
DOI | 10.1007/978-3-030-22750-0_66 |
FACT: a Framework for Analysis and Capture of Twitter Graphs
In The Sixth IEEE International Conference on Social Networks Analysis, Management and Security (SNAMS-2019) . IEEE, 2019.Status: Published
FACT: a Framework for Analysis and Capture of Twitter Graphs
In the recent years, online social networks have become an important source of news and the primary place for political debates for a growing part of the population. At the same time, the spread of fake news and digital wildfires (fast- spreading and harmful misinformation) has become a growing concern worldwide, and online social networks the problem is most prevalent. Thus, the study of social networks is an essential component in the understanding of the fake news phenomenon. Of particular interest is the network connectivity between participants, since it makes communication patterns visible. These patterns are hidden in the offline world, but they have a profound impact on the spread of ideas, opinions and news. Among the major social networks, Twitter is of special interest. Because of its public nature, Twitter offers the possibility to perform research without the risk of breaching the expectation of privacy. However, obtaining sufficient amounts of data from Twitter is a fundamental challenge for many researchers. Thus, in this paper, we present a scalable framework for gathering the graph structure of follower networks, posts and profiles. We also show how to use the collected data for high-performance social network analysis.
Afilliation | Software Engineering |
Project(s) | UMOD: Understanding and Monitoring Digital Wildfires, Department of Holistic Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2019 |
Conference Name | The Sixth IEEE International Conference on Social Networks Analysis, Management and Security (SNAMS-2019) |
Pagination | 134-141 |
Publisher | IEEE |
DOI | 10.1109/SNAMS.2019.8931870 |
Multi-Modal Machine Learning for Flood Detection in News, Social Media and Satellite Sequences
In Multimediaeval Benchmark 2019. CEUR Workshop Proceedings, 2019.Status: Published
Multi-Modal Machine Learning for Flood Detection in News, Social Media and Satellite Sequences
Afilliation | Machine Learning |
Project(s) | Department of Holistic Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2019 |
Conference Name | Multimediaeval Benchmark 2019 |
Date Published | 10/2019 |
Publisher | CEUR Workshop Proceedings |
Towards Detailed Real-Time Simulations of Cardiac Arrhythmia
In Computing in Cardiology. Vol. 46. IEEE, 2019.Status: Published
Towards Detailed Real-Time Simulations of Cardiac Arrhythmia
Recent advances in personalized arrhythmia risk pre- diction show that computational models can provide not only safer but also more accurate results than invasive pro- cedures. However, biophysically accurate simulations re- quire solving linear systems over fine meshes and time res- olutions, which can take hours or even days. This limits the use of such simulations in the clinic where diagnosis and treatment planning can be time sensitive, even if it is just for the reason of operation schedules. Furthermore, the non-interactive, non-intuitive way of accessing simula- tions and their results makes it hard to study these collab- oratively. Overcoming these limitations requires speeding up computations from hours to seconds, which requires a massive increase in computational capabilities.
Fortunately, the cost of computing has fallen dramati- cally in the past decade. A prominent reason for this is the recent introduction of manycore processors such as GPUs, which by now power the majority of the world’s leading supercomputers. These devices owe their success to the fact that they are optimized for massively parallel work- loads, such as applying similar ODE kernel computations to millions of mesh elements in scientific computing ap- plications. Unlike CPUs, which are typically optimized for sequential performance, this allows GPU architectures to dedicate more transistors to performing computations, thereby increasing parallel speed and energy efficiency.
Afilliation | Scientific Computing |
Project(s) | Meeting Exascale Computing with Source-to-Source Compilers |
Publication Type | Proceedings, refereed |
Year of Publication | 2019 |
Conference Name | Computing in Cardiology |
Volume | 46 |
Date Published | 12/2019 |
Publisher | IEEE |
Talks, contributed
Compiling finite element variational forms for GPU-based assembly
In FEniCS‘19, Washington DC, USA, 2019.Status: Published
Compiling finite element variational forms for GPU-based assembly
We present an experimental form compiler for exploring GPU-based algorithms for assembling vectors, matrices, and higher-order tensors from finite element variational forms.
Previous studies by Cecka et al. (2010), Markall et al. (2013), and Reguly and Giles (2015) have explored different strategies for using GPUs for finite element assembly, demonstrating the potential rewards and highlighting some of the difficulties in offloading assembly to a GPU. Even though these studies are limited to a few specific cases, mostly related to the Poisson problem, they already indicate that to achieve high performance, the appropriate assembly strategy depends on the problem at hand and the chosen discretisation.
By using a form compiler to automatically generate code for GPU-based assembly, we can explore a range of problems based on different variational forms and finite element discretisations. In this way, we aim to get a better picture of the potential benefits and challenges of assembling finite element variational forms on a GPU. Ultimately, the goal is to explore algorithms based on automated code generation that offload entire finite element methods to a GPU, including assembly of vectors and matrices and solution of linear systems.
In this talk, we give an exact characterisation of the class of finite element variational forms supported by our compiler, comprising a small subset of the Unified Form Language that is used by FEniCS and Firedrake. Furthermore, we describe a denotational semantics that explains how expressions in the form language are translated to low-level C or CUDA code for performing assembly over a computational mesh. We also present some initial results and discuss the performance of the generated code.
Afilliation | Scientific Computing |
Project(s) | Meeting Exascale Computing with Source-to-Source Compilers, Department of High Performance Computing , Department of Numerical Analysis and Scientific Computing |
Publication Type | Talks, contributed |
Year of Publication | 2019 |
Location of Talk | FEniCS‘19, Washington DC, USA |
Keywords | Code translation, GPU, HPC |
Don't Trust Your Eyes: Manipulation of Visual Media in the Age of Deepfakes
In 4th International Conference on Communication & Media Studies, Bonn, Germany, 2019.Status: Published
Don't Trust Your Eyes: Manipulation of Visual Media in the Age of Deepfakes
In 2017, a large number of internet users were shocked to discover that using deep learning technology, realistically looking videos depicting an arbitrary person performing arbitrary actions can be created easily using nothing but a modern personal computer. In 2018, it quickly became clear that such videos could fool all but the most alert observers. Since most people tend to trust video recordings over most other media, DeepFakes represent a dangerous new tool for manipulating public opinion and thus undermining democracy. In this study, we give an accessible introduction to the technological details that make DeepFakes possible. In the second part, we discuss technological countermeasures as well as the wider implications of this technology and its significance for public opinion in democratic countries.
Afilliation | Scientific Computing |
Project(s) | UMOD: Understanding and Monitoring Digital Wildfires |
Publication Type | Talks, contributed |
Year of Publication | 2019 |
Location of Talk | 4th International Conference on Communication & Media Studies, Bonn, Germany |
When sharing is not caring: Individual differences in the evaluation of news quality
In 4th International Conference on Communication & Media Studies, Bonn, Germany, 2019.Status: Published
When sharing is not caring: Individual differences in the evaluation of news quality
Afilliation | Scientific Computing |
Project(s) | UMOD: Understanding and Monitoring Digital Wildfires |
Publication Type | Talks, contributed |
Year of Publication | 2019 |
Location of Talk | 4th International Conference on Communication & Media Studies, Bonn, Germany |
Journal Article
Performance optimization and modeling of fine-grained irregular communication in UPC
Scientific Programming 2019 (2019): Article ID 6825728.Status: Published
Performance optimization and modeling of fine-grained irregular communication in UPC
The UPC programming language offers parallelism via logically partitioned shared memory, which typically spans physically disjoint memory sub-systems. One convenient feature of UPC is its ability to automatically execute between-thread data movement, such that the entire content of a shared data array appears to be freely accessible by all the threads. The programmer friendliness, however, can come at the cost of substantial performance penalties. This is especially true when indirectly indexing the elements of a shared array, for which the induced between-thread data communication can be irregular and have a fine-grained pattern. In this paper we study performance enhancement strategies specifically targeting such fine-grained irregular communication in UPC. Starting from explicit thread privatization, continuing with block-wise communication, and arriving at message condensing and consolidation, we obtained considerable performance improvement of UPC programs that originally require fine-grained irregular communication. Besides the performance enhancement strategies, the main contribution of the present paper is to propose performance models for the different scenarios, in form of quantifiable formulas that hinge on the actual volumes of various data movements plus a small number of easily obtainable hardware characteristic parameters. These performance models help to verify the enhancements obtained, while also providing insightful predictions of similar parallel implementations, not limited to UPC, that also involve between-thread or between-process irregular communication. As a further validation, we also apply our performance modeling methodology and hardware characteristic parameters to an existing UPC code for solving a 2D heat equation on a uniform mesh.
Afilliation | Scientific Computing |
Project(s) | PREAPP: PRoductivity and Energy-efficiency through Abstraction-based Parallel Programming , Meeting Exascale Computing with Source-to-Source Compilers |
Publication Type | Journal Article |
Year of Publication | 2019 |
Journal | Scientific Programming |
Volume | 2019 |
Pagination | Article ID 6825728 |
Date Published | 03/2019 |
Publisher | Hindawi |
Keywords | Fine-grained irregular communication, performance modeling, Performance optimization, Sparse matrix-vector multiplication, UPC programming language |
URL | https://www.hindawi.com/journals/sp/2019/6825728/ |
DOI | 10.1155/2019/6825728 |
Talk, keynote
PGAS for graph analytics: can one sided communications break the scalability barrier ?
In EFFECT workshop, Tromsø, Norway, 2019.Status: Published
PGAS for graph analytics: can one sided communications break the scalability barrier ?
As the world is becoming increasingly interconnected, systems are becoming increasingly complex. Therefore, technologies that can analyze connected systems and their dynamic characteristics become indispensable. Consequently, the last decade has seen increasing interest in graph analytics, which allows obtaining insights from such connected data. Parallel graph analytics can reveal the workings of intricate systems and networks at massive scales, which are found in diverse areas such as social networks, economic transactions, and protein interactions. While sequential graph algorithms have been studied for decades, the recent availability of massive datasets has given rise to the need for parallel graph processing, which poses unique challenges.
Benchmarks such as the Graph 500 have shown that graph processing performance is largely unrelated to traditional measurements of performance such as FLOPS or memory bandwidth. Instead, algorithmic communication aggregation and network latencies play a crucial role here.
In this talk we introduce the area of parallel graph analytics with a special focus on news dissemination, along with the technical challenges it presents and discuss how PGAS systems with support for one-sided messaging, such as UPC++, can help in overcoming these challenges.
Afilliation | Scientific Computing |
Project(s) | Meeting Exascale Computing with Source-to-Source Compilers |
Publication Type | Talk, keynote |
Year of Publication | 2019 |
Location of Talk | EFFECT workshop, Tromsø, Norway |
Date Published | 04/2019 |
Keywords | Convergence, Graph algorithms, PGAS |
Poster
Towards Detailed Real-Time Simulations of Cardiac Arrhythmia
International Conference in Computing in Cardiology, Singapore, 2019.Status: Published
Towards Detailed Real-Time Simulations of Cardiac Arrhythmia
Recent advances in personalized arrhythmia risk prediction show that computational models can provide not only safer but also more accurate results than invasive procedures. However, biophysically accurate simulations require solving linear systems over fine meshes and time resolutions, which can take hours or even days. This limits the use of such simulations in the clinic where diagnosis and treatment planning can be time sensitive, even if it is just for the reason of operation schedules. Furthermore, the non-interactive, non-intuitive way of accessing simulations and their results makes it hard to study these collaboratively.
Overcoming these limitations requires speeding up computations from hours to seconds, which requires a massive increase in computational capabilities.
Fortunately, the cost of computing has fallen dramatically in the past decade. A prominent reason for this is the recent introduction of manycore processors such as GPUs, which by now power the majority of the world’s leading supercomputers. These devices owe their success to the fact that they are optimized for massively parallel workloads, such as applying similar ODE kernel computations to millions of mesh elements in scientific computing applications. Unlike CPUs, which are typically optimized for sequential performance, this allows GPU architectures to dedicate more transistors to performing computations, thereby increasing parallel speed and energy efficiency.
In this poster, we present ongoing work on the parallelization of finite volume computations over an unstructured mesh as well as the challenges involved in building scalable simulation codes and discuss the steps needed to close the gap to accurate real-time computations.
Afilliation | Scientific Computing |
Project(s) | Meeting Exascale Computing with Source-to-Source Compilers, Department of High Performance Computing |
Publication Type | Poster |
Year of Publication | 2019 |
Date Published | 09/2019 |
Place Published | International Conference in Computing in Cardiology, Singapore |
Talks, invited
A distributed - memory parallel approximation of maximum weight perfect bipartite matching
In Pacific Northwest National Laboratory, Richland, WA, USA, 2018.Status: Published
A distributed - memory parallel approximation of maximum weight perfect bipartite matching
We discuss efficient parallel approximation algorithm for the problem of maximum weight perfect matching in bipartite graphs, i.e. the problem of finding a set of non-adjacent edges that covers all vertices and has maximum
weight. This problem differs from the maximum weight matching problem, for which scalable approximation algorithms are known. It is primarily motivated by finding good pivots in scalable sparse direct solvers before
factorization where sequential implementations of maximum weight perfect matching algorithms, are generally used due to the lack of scalable alternatives. To overcome this limitation, we propose a parallel distributed memory
algorithm and discuss its approximation properties.
Afilliation | Scientific Computing |
Project(s) | Meeting Exascale Computing with Source-to-Source Compilers |
Publication Type | Talks, invited |
Year of Publication | 2018 |
Location of Talk | Pacific Northwest National Laboratory, Richland, WA, USA |
Talks, contributed
A distributed - memory parallel approximation of maximum weight perfect bipartite matching
In Sparse Days 2018, Toulouse, France, 2018.Status: Published
A distributed - memory parallel approximation of maximum weight perfect bipartite matching
We discuss efficient parallel approximation algorithm for the problem of maximum weight perfect matching in bipartite graphs, i.e. the problem of finding a set of non-adjacent edges that covers all vertices and has maximum
weight. This problem differs from the maximum weight matching problem, for which scalable approximation algorithms are known. It is primarily motivated by finding good pivots in scalable sparse direct solvers before
factorization where sequential implementations of maximum weight perfect matching algorithms, are generally used due to the lack of scalable alternatives. To overcome this limitation, we propose a parallel distributed memory
algorithm and discuss its approximation properties.
Afilliation | Scientific Computing |
Project(s) | UMOD: Understanding and Monitoring Digital Wildfires |
Publication Type | Talks, contributed |
Year of Publication | 2018 |
Location of Talk | Sparse Days 2018, Toulouse, France |
Keywords | Bipartite graphs, graph theory, matching, parallel approximation algorithms, transversals |
Notes | Similar talk was given again at Inria Bordeaux in October 2018 |
Journal Article
A Distributed-Memory Approximation Algorithm for Maximum Weight Perfect Bipartite Matching
SIAM Journal on Scientific Computing (2018).Status: Submitted
A Distributed-Memory Approximation Algorithm for Maximum Weight Perfect Bipartite Matching
We design and implement an efficient parallel approximation algorithm for the problem of maximum weight perfect matching in bipartite graphs, i.e. the problem of finding a set of non-adjacent edges that covers all vertices and has maximum weight. This problem differs from the maximum weight matching problem, for which scalable approximation algorithms are known. It is primarily motivated by finding good pivots in scalable sparse direct solvers before factorization where sequential implementations of maximum weight perfect matching algorithms, such as those available in MC64, are widely used due to the lack of scalable alternatives.
To overcome this limitation, we propose a fully parallel distributed memory algorithm that first generates a perfect matching and then searches for weight-augmenting cycles of length four in parallel and iteratively augments the matching with a vertex disjoint set of such cycles. For most practical problems the weights of the perfect matchings generated by our algorithm are very close to the optimum.
An efficient implementation of the algorithm scales up to 256 nodes (17,408 cores) on a Cray XC40 supercomputer and can solve instances that are too large to be handled by a single node using the sequential algorithm.
Afilliation | Scientific Computing |
Project(s) | Meeting Exascale Computing with Source-to-Source Compilers |
Publication Type | Journal Article |
Year of Publication | 2018 |
Journal | SIAM Journal on Scientific Computing |
Publisher | SIAM |
Keywords | Bipartite graphs, graph theory, matching, parallel approximation algorithms, transversals |
Public outreach
ExaGraph Collaboration with STRUMPACK/SuperLU: Factorization-Based Sparse Solvers and Preconditioners for Exascale
ECP Website: Exascale Computing Project (ECP), 2018.Status: Published
ExaGraph Collaboration with STRUMPACK/SuperLU: Factorization-Based Sparse Solvers and Preconditioners for Exascale
When researchers try to solve science and engineering problems, they often create systems of linear equations that need to be solved. Software libraries known as solvers provide mathematical tools that can be applied to similar problems. Direct solvers that use variants of Gaussian elimination are one of the most popular methods for solving such systems due to their robustness, especially for algebraic systems arising from multiphysics simulations.
In many situations, the system is sparse, meaning that the majority of the matrix entries are zero and ideally need neither to be stored nor operated on. The strength of SuperLU and STRUMPACK is that they can automatically determine which matrix entries are zeros and can be ignored, allowing the computer to focus its calculations on the other entries and finish the problem much faster.
Results of the effort showed that the parallel AWPM (approximate-weight perfect matching) code can run up to 2,500x faster than the sequential algorithm on 256 nodes of the Cori/KNL supercomputer.
Afilliation | Scientific Computing |
Project(s) | Meeting Exascale Computing with Source-to-Source Compilers |
Publication Type | Public outreach |
Year of Publication | 2018 |
Date Published | 08/2018 |
Publisher | Exascale Computing Project (ECP) |
Place Published | ECP Website |
URL | https://www.exascaleproject.org/exagraph-with-strumpack-superlu/ |
Proceedings, refereed
GPU-based Acceleration of Detailed Tissue-Scale Cardiac Simulations
In Proceedings of the 11th Workshop on General Purpose GPUs. New York, NY, USA: ACM, 2018.Status: Published
GPU-based Acceleration of Detailed Tissue-Scale Cardiac Simulations
We present a GPU based implementation for tissue-scale 3D sim- ulations of the human cardiac ventricle using a physiologically realistic cell model. Computational challenges in such simulations arise from two factors, the rst of which is the sheer amount of computation when simulating a large number of cardiac cells in a detailed model containing 104 calcium release units, 106 stochasti- cally changing ryanodine receptors and 1.5 × 105 L-type calcium channels per cell.
Additional challenges arise from the fact that the computational tasks have various levels of arithmetic intensity and control com- plexity, which require careful adaptation of the simulation code to the target device. By exploiting the strengths of the GPU, we obtain a performance that is far superior to that of the CPU, and also signi cantly higher than that of other state of the art manycore devices, thus paving the way for detailed whole-heart simulations in future generations of leadership class supercomputers.
Afilliation | Scientific Computing |
Project(s) | Meeting Exascale Computing with Source-to-Source Compilers |
Publication Type | Proceedings, refereed |
Year of Publication | 2018 |
Conference Name | Proceedings of the 11th Workshop on General Purpose GPUs |
Pagination | 31-38 |
Publisher | ACM |
Place Published | New York, NY, USA |
DOI | 10.1145/3180270.3180274 |
Memory Bandwidth Contention: Communication vs Computation Tradeoffs in Supercomputers with Multicore Architectures
In International Conference on Parallel and Distributed Systems (ICPADS). Singapore: ACM/IEEE, 2018.Status: Published
Memory Bandwidth Contention: Communication vs Computation Tradeoffs in Supercomputers with Multicore Architectures
We study the problem of contention for memory bandwidth between computation and communication in supercomputers that feature multicore CPUs. The problem arises when communication and computation are overlapped, and both operations compete for the same memory bandwidth. This contention is most visible at the limits of scalability, when communication and computation take similar amounts of time, and thus must be taken into account in order to reach maximum scalability in memory bandwidth bound applications. Typical examples of codes affected by the memory bandwidth contention problem are sparse matrix-vector computations, graph algorithms, and many machine learning problems, as they typically exhibit a high demand for both memory bandwidth and inter-node communication, while performing a relatively low number of arithmetic operations.
The problem is even more relevant in truly heterogeneous computations where CPUs and accelerators are used in concert. In that case it can lead to mispredictions of expected performance and consequently to suboptimal load balancing between CPU and accelerator, which in turn can lead to idling of powerful accelerators and thus to a large decrease in performance.
We propose a simple benchmark in order to quantify the loss of performance due to memory bandwidth contention. Based on that, we derive a theoretical model to determine the impact of the phenomenon on parallel memory-bound applications. We test the model on scientific computations, discuss the practical relevance of the problem and suggest possible techniques to remedy it.
Afilliation | Scientific Computing |
Project(s) | Meeting Exascale Computing with Source-to-Source Compilers |
Publication Type | Proceedings, refereed |
Year of Publication | 2018 |
Conference Name | International Conference on Parallel and Distributed Systems (ICPADS) |
Publisher | ACM/IEEE |
Place Published | Singapore |
Keywords | Hybrid MPI/OpenMP, Memory bandwidth contention, Multicore supercomputers, performance modeling, Scientific Computing |
Talk, keynote
Heterogeneous HPC Computations in Cardiac Electrophysiology
In 19th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC 2018), Vancouver, Canada, 2018.Status: Published
Heterogeneous HPC Computations in Cardiac Electrophysiology
Detailed organ-scale simulations of calcium handling and electrical signal transmission in the human heart require stochastic simulation of a large number of ion channels in each cell, which consumes immense processing power for the simulation of a single heartbeat, thereby creating the need for large scale parallel implementations. We present codes for solving such cardiac models on structured and unstructured meshes, and discuss the challenges involved in modernizing these codes to run on heterogeneous supercomputers. We focus on the interaction between OpenMP, MPI, and CUDA in such computations, as well as optimizations to communication and vector processing, and illustrate practical experiences with these applications on different supercomputers.
Afilliation | Scientific Computing |
Project(s) | Meeting Exascale Computing with Source-to-Source Compilers |
Publication Type | Talk, keynote |
Year of Publication | 2018 |
Location of Talk | 19th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC 2018), Vancouver, Canada |
Type of Talk | Keynote |
Keywords | Cardiac modeling, GPU, heterogeneous computing, HPC |
URL | http://cse.stfx.ca/~pdsec18/keynote.php |
Poster
Quantifying data traffic of sparse matrix-vector multiplication in a multi-level memory hierarchy
London, UK, 2018.Status: Published
Quantifying data traffic of sparse matrix-vector multiplication in a multi-level memory hierarchy
Sparse matrix-vector multiplication (SpMV) is the central operation in an iterative linear solver. On a computer with a multi-level memory hierarchy, SpMV performance is limited by memory or cache bandwidth. Furthermore, for a given sparse matrix, the volume of data traffic depends on the location of the matrix non-zeros. By estimating the volume of data traffic with Aho, Denning and Ullman’s page replacement model [1], we can locate bottlenecks in the memory hierarchy and evaluate optimizations such as matrix reordering. The model is evaluated by comparing with measurements from hardware performance counters on Intel Sandy Bridge.
[1]: Alfred V. Aho, Peter J. Denning, and Jeffrey D. Ullman. 1971. Principles of Optimal Page Replacement. J. ACM 18, 1 (January 1971), pp. 80-93.
Afilliation | Scientific Computing |
Project(s) | Meeting Exascale Computing with Source-to-Source Compilers |
Publication Type | Poster |
Year of Publication | 2018 |
Date Published | 06/2018 |
Place Published | London, UK |
Towards Detailed Organ-Scale Simulations in Cardiac Electrophysiology
International Symposium on Computational Science at Scale (CoSaS), Erlangen, Germany, 2018.Status: Published
Towards Detailed Organ-Scale Simulations in Cardiac Electrophysiology
We present implementations of tissue-scale 3D simulations of the human cardiac ventricle using a physiologically realistic cell model. Computational challenges in such simulations arise from two factors, the first of which is the sheer amount of computation when simulating a large number of cardiac cells in a detailed model containing 10^4 calcium release units, 10^6 stochastically changing ryanodine receptors and 1.5 × 10^5 L-type calcium channels per cell.
Additional challenges arise from the fact that the computational tasks have various levels of arithmetic intensity and control complexity, which require careful adaptation of the simulation code to the target device. By exploiting the strengths of GPUs and manycore accelerators, we obtain a performance that is far superior to that of the basic CPU implementation, thus paving the way for detailed whole-heart simulations in future generations of leadership class supercomputers.
Afilliation | Scientific Computing |
Project(s) | Meeting Exascale Computing with Source-to-Source Compilers |
Publication Type | Poster |
Year of Publication | 2018 |
Date Published | 09/2018 |
Place Published | International Symposium on Computational Science at Scale (CoSaS), Erlangen, Germany |
Type of Work | Poster |
Keywords | Cardiac electrophysiology, GPU, Scientific Computing, Xeon Phi |
Talk, keynote
Accelerated high-performance computing for computational cardiac electrophysiology
In The University of Tokyo, Tokyo, Japan, 2017.Status: Published
Accelerated high-performance computing for computational cardiac electrophysiology
Massively parallel hardware accelerators, such as GPUs, are nowadays prevalent in the HPC hardware landscape. While having tremendous computing power, these accelerators also bring programming challenges. Often, a different programming standard applies for the accelerators than that for the conventional CPUs. For computing clusters that consist of both accelerators and CPUs, where the latter are hosts of the accelerators, elaborate hybrid parallel programming is needed to ensure an efficient use of the heterogeneous hardware.
This talk aims to share some experiences of implementing computational science software for heterogeneous computing platforms. We look at two scenarios: CPU+GPU [1] and CPU+Xeon Phi [2][3] heterogeneous computing. Common for both scenarios is the necessity of a proper pipelining of the involved computational and communication tasks, such that the overhead of various data movements can be reduced or completely masked. Moreover, suitable multi-threading with thread divergence is needed on the CPU host side. This is for enforcing computation-communication overlap, coordinating the accelerators, and allowing the CPU hosts to also contribute with their computing power. We have successfully applied hybrid CPU+Knights Corner co-processor computing [2][3] to two topics of computational cardiac electrophysiology, making use of the Tianhe-2 supercomputer. Results [4] about using the new Xeon Phi Knights Landing processor will also be presented.
[1]. J. Langguth, M. Sourouri, G. T. Lines, S. B. Baden, and X. Cai. Scalable heterogeneous CPU-GPU computations for unstructured tetrahedral meshes. IEEE Micro, 35(4):6–15, 2015.
[2]. J. Chai, J. Hake, N. Wu, M. Wen, X. Cai, G. T. Lines, J. Yang, H. Su, C. Zhang, and X. Liao. Towards simulation of subcellular calcium dynamics at nanometre resolution. International Journal of High Performance Computing Applications, 29(1):51–63, 2015.
[3]. J. Langguth, Q. Lan, N. Gaur, and X. Cai. Accelerating detailed tissue-scale 3D cardiac simulations using heterogeneous CPU-Xeon Phi computing. International Journal of Parallel Programming, 45(5):1236–1258, 2017.
[4]. J. Langguth, C. Jarvis, and X. Cai. Porting tissue-scale cardiac simulations to the Knights Landing platform. Proceedings of ISC High Performance 2017, 376–388, 2017.
Afilliation | Scientific Computing |
Project(s) | Meeting Exascale Computing with Source-to-Source Compilers |
Publication Type | Talk, keynote |
Year of Publication | 2017 |
Location of Talk | The University of Tokyo, Tokyo, Japan |
Notes | 2nd International Symposium on Research and Education of Computational Science |
Talks, contributed
Heterogeneous Manycore Simulations in Cardiac Electrophysiology
In Tenth International Workshop on Programmability and Architectures for Heterogeneous Multicores (MULTIPROG-2017), Stockholm, Sweden, 2017.Status: Published
Heterogeneous Manycore Simulations in Cardiac Electrophysiology
The demand for computing power in computational cardiology is continuously increasing due to the use of physiologically realistic cell models and the need to simulate the heart and use higher resolutions in time and space. And while parallel computing itself is widely used in the field by now, the arrival of heterogeneous multicore architectures presents an important opportunity for speeding up the cardiac simulation process. However, this speedup is not obtained effortlessly, and achieving it presents numerous computational challenges of its own.
We present a summary of our experiences from the implementation of multiple cardiac research codes on multicore, manycore, and GPU processors. Our main goal is to highlight the platform-specific challenges we face in our applications, ranging from efficient data movement to optimizations for large-scale irregular computations. Based on that, we discuss the suitability of the different plattforms for cardiac computations.
Afilliation | Scientific Computing |
Project(s) | Meeting Exascale Computing with Source-to-Source Compilers, Center for Biomedical Computing (SFF) |
Publication Type | Talks, contributed |
Year of Publication | 2017 |
Location of Talk | Tenth International Workshop on Programmability and Architectures for Heterogeneous Multicores (MULTIPROG-2017), Stockholm, Sweden |
Proceedings, refereed
Porting Tissue-Scale Cardiac Simulations to the Knights Landing Platform
In International Conference on High Performance Computing. Lecture Notes in Computer Science, Springer, 2017.Status: Published
Porting Tissue-Scale Cardiac Simulations to the Knights Landing Platform
Afilliation | Scientific Computing |
Project(s) | Meeting Exascale Computing with Source-to-Source Compilers |
Publication Type | Proceedings, refereed |
Year of Publication | 2017 |
Conference Name | International Conference on High Performance Computing |
Date Published | 10/2017 |
Publisher | Lecture Notes in Computer Science, Springer |
ISBN Number | 978-3-319-67629-6 |
DOI | 10.1007/978-3-319-67630-2_28 |
Towards fine-grained dynamic tuning of HPC applications on modern multi-core architectures
In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC'17). New York, NY, USA: ACM Press, 2017.Status: Published
Towards fine-grained dynamic tuning of HPC applications on modern multi-core architectures
There is a consensus that exascale systems should operate within a power envelope of 20MW. Consequently, energy conservation is still considered as the most crucial constraint if such systems are to be realized.
So far, most research on this topic focused on strategies such as power capping and dynamic power management. Although these approaches can reduce power consumption, we believe that they might not be sufficient to reach the exascale energy-efficiency goals. Hence, we aim to adopt techniques from embedded systems, where energy-efficiency has always been the fundamental objective.
A successful energy-saving technique used in embedded systems is to integrate fine-grained autotuning with dynamic voltage and frequency scaling. In this paper, we apply a similar technique to a real-world HPC application. Our experimental results on a HPC cluster indicate that such an approach saves up to 20% of energy compared to the baseline configuration, with negligible performance loss.
Afilliation | Scientific Computing |
Project(s) | Meeting Exascale Computing with Source-to-Source Compilers |
Publication Type | Proceedings, refereed |
Year of Publication | 2017 |
Conference Name | Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC'17) |
Date Published | 11/2017 |
Publisher | ACM Press |
Place Published | New York, NY, USA |
ISBN Number | 9781450351140 |
URL | http://dl.acm.org/citation.cfm?doid=3126908http://dl.acm.org/citation.cf... |
DOI | 10.1145/3126908.3126945 |
Journal Article
Accelerating Detailed Tissue-Scale 3D Cardiac Simulations Using Heterogeneous CPU-Xeon Phi Computing
International Journal of Parallel Programming (2016): 1-23.Status: Published
Accelerating Detailed Tissue-Scale 3D Cardiac Simulations Using Heterogeneous CPU-Xeon Phi Computing
We investigate heterogeneous computing, which involves both multicore CPUs and manycore Xeon Phi coprocessors, as a new strategy for computational cardiology. In particular, 3D tissues of the human cardiac ventricle are studied with a physiologically realistic model that has 10,000 calcium release units per cell and 100 ryanodine receptors per release unit, together with tissue-scale simulations of the electrical activity and calcium handling. In order to attain resource-efficient use of heterogeneous computing systems that consist of both CPUs and Xeon Phis, we first direct the coding effort at ensuring good performance on the two types of compute devices individually. Although SIMD code vectorization is the main theme of performance programming, the actual implementation details differ considerably between CPU and Xeon Phi. Moreover, in addition to combined OpenMP+MPI programming, a suitable division of the cells between the CPUs and Xeon Phis is important for resource-efficient usage of an entire heterogeneous system. Numerical experiments show that good resource utilization is indeed achieved and that such a heterogeneous simulator paves the way for ultimately understanding the mechanisms of arrhythmia. The uncovered good programming practices can be used by computational scientists who want to adopt similar heterogeneous hardware platforms for a wide variety of applications.
Afilliation | Scientific Computing |
Project(s) | User-friendly programming of GPU-enhanced clusters, Center for Biomedical Computing (SFF) |
Publication Type | Journal Article |
Year of Publication | 2016 |
Journal | International Journal of Parallel Programming |
Pagination | 1-23 |
Date Published | 10/2016 |
Publisher | ACM/Springer |
Keywords | Calcium handling, multiscale cardiac tissue simulation, supercomputing, Xeon Phi |
DOI | 10.1007/s10766-016-0461-2 |
Proceedings, refereed
Enabling Tissue-Scale Cardiac Simulations Using Heterogeneous Computing on Tianhe-2
In IEEE 22nd International Conference on Parallel and Distributed Systems (ICPADS). ACM/IEEE, 2016.Status: Published
Enabling Tissue-Scale Cardiac Simulations Using Heterogeneous Computing on Tianhe-2
We develop a simulator for 3D tissue of the human cardiac ventricle with a physiologically realistic cell model and deploy it on the supercomputer Tianhe-2. In order to attain the full performance of the heterogeneous CPU-Xeon Phi design, we use carefully optimized codes for both devices and combine them to obtain suitable load balancing. Using a large number of nodes, we are able to perform tissue-scale simulations of the electrical activity and calcium handling in millions of cells, at a level of detail that tracks the states of trillions of ryanodine receptors. We can thus simulate arrythmogenic spiral waves and other complex arrhythmogenic patterns which arise from calcium handling deficiencies in human cardiac ventricle tissue. Due to extensive code tuning and parallelization via OpenMP, MPI, and SCIF/COI, large scale simulations of 10 heartbeats can be performed in a matter of hours. Test results indicate excellent scalability, thus paving the way for detailed whole-heart simulations in future generations of leadership class supercomputers.
Afilliation | Scientific Computing |
Project(s) | User-friendly programming of GPU-enhanced clusters, Center for Biomedical Computing (SFF) |
Publication Type | Proceedings, refereed |
Year of Publication | 2016 |
Conference Name | IEEE 22nd International Conference on Parallel and Distributed Systems (ICPADS) |
Pagination | 843-852 |
Date Published | 12/2016 |
Publisher | ACM/IEEE |
ISSN Number | 1521-9097 |
Keywords | Calcium handling, multiscale cardiac tissue simulation, supercomputing, Xeon Phi |
DOI | 10.1109/ICPADS.2016.0114 |
On the Performance and Energy Efficiency of the PGAS Programming Model on Multicore Architectures
In High Performance Computing & Simulation (2016) - International Workshop on Optimization of Energy Efficient HPC & Distributed Systems. ACM IEEE, 2016.Status: Published
On the Performance and Energy Efficiency of the PGAS Programming Model on Multicore Architectures
Afilliation | Scientific Computing |
Project(s) | PREAPP: PRoductivity and Energy-efficiency through Abstraction-based Parallel Programming , Center for Biomedical Computing (SFF) |
Publication Type | Proceedings, refereed |
Year of Publication | 2016 |
Conference Name | High Performance Computing & Simulation (2016) - International Workshop on Optimization of Energy Efficient HPC & Distributed Systems |
Date Published | 08/2016 |
Publisher | ACM IEEE |
URL | http://dx.doi.org/10.1109/HPCSim.2016.7568416 |
DOI | 10.1109/HPCSim.2016.7568416 |
Talks, invited
Heterogeneous HPC solutions in cardiac electrophysiology
In Lawrence Berkeley National Laboratory, Berkeley, CA, USA, 2016.Status: Published
Heterogeneous HPC solutions in cardiac electrophysiology
Detailed simulations of electrical signal transmission in the human heart require immense processing power, thereby creating the need for large scale parallel implementations. We present two heterogeneous codes solving such problems, focusing on the interaction between OpenMP, MPI, and CUDA in irregular computations, and discuss practical experiences on different supercomputers.
Afilliation | Scientific Computing |
Project(s) | User-friendly programming of GPU-enhanced clusters, Center for Biomedical Computing (SFF) |
Publication Type | Talks, invited |
Year of Publication | 2016 |
Location of Talk | Lawrence Berkeley National Laboratory, Berkeley, CA, USA |
Hierarchical partitioning of unstructured meshes in cardiac electrophysiology
In Third Workshop on Programming Abstractions for Data Locality (PADAL'16), Kobe, Japan, 2016.Status: Published
Hierarchical partitioning of unstructured meshes in cardiac electrophysiology
Unstructured meshes are widely used in computational science and provide numerous advantages over structured meshes in many applications. W.r.t. data locality however, their irregular data structures and access patterns pose severe challenges, especially on modern heterogeneous clusters with deep memory hierarchies which are poised to become the standard in the foreseeable future. We discuss several of the challenges involved in hierarchical partitioning, such as preserving locality in complex accelerator-equipped nodes, load balancing between heterogeneous devices, and reordering for cache efficiency, as well as solutions to these problems implemented in a cell-centered finite volume simulation code from cardiac electrophysiology. Finally, we present our current steps on the way to a true hierarchically partitioning software for automatic data placement across heterogeneous clusters with varying node architectures.
Afilliation | Scientific Computing |
Project(s) | Meeting Exascale Computing with Source-to-Source Compilers, Center for Biomedical Computing (SFF) |
Publication Type | Talks, invited |
Year of Publication | 2016 |
Location of Talk | Third Workshop on Programming Abstractions for Data Locality (PADAL'16), Kobe, Japan |
Journal Article
Codesign Lessons Learned from Implementing Graph Matching on Multithreaded Architectures
IEEE Computer 48 (2015): 46-55.Status: Published
Codesign Lessons Learned from Implementing Graph Matching on Multithreaded Architectures
Afilliation | , Scientific Computing |
Publication Type | Journal Article |
Year of Publication | 2015 |
Journal | IEEE Computer |
Volume | 48 |
Pagination | 46–55 |
Date Published | 08/2015 |
Publisher | ACM IEEE |
URL | http://doi.ieeecomputersociety.org/10.1109/MC.2015.215 |
DOI | 10.1109/MC.2015.215 |
Parallel performance modeling of irregular applications in cell-centered finite volume methods over unstructured tetrahedral meshes
Journal of Parallel and Distributed Computing 76 (2015): 120-131.Status: Published
Parallel performance modeling of irregular applications in cell-centered finite volume methods over unstructured tetrahedral meshes
Finite volume methods are widely used numerical strategies for solving partial differential equations. This paper aims at obtaining a quantitative understanding of the achievable performance of the cell-centered finite volume method on 3D unstructured tetrahedral meshes, using traditional multicore CPUs as well as modern GPUs. By using an optimized implementation and a synthetic connectivity matrix that exhibits a perfect structure of equal-sized blocks lying on the main diagonal, we can closely relate the achievable computing performance to the size of these diagonal blocks. Moreover, we have derived a theoretical model for identifying characteristic levels of the attainable performance as a function of hardware parameters, based on which a realistic upper limit of the performance can be predicted accurately. For real-world tetrahedral meshes, the key to high performance lies in a reordering of the tetrahedra, such that the resulting connectivity matrix resembles a block diagonal form where the optimal size of the blocks depends on the hardware. Numerical experiments confirm that the achieved performance is close to the practically attainable maximum and it reaches 75% of the theoretical upper limit, independent of the actual tetrahedral mesh considered. From this, we develop a general model capable of identifying bottleneck performance of a system’s memory hierarchy in irregular applications.
Afilliation | , , Scientific Computing |
Project(s) | Center for Biomedical Computing (SFF) |
Publication Type | Journal Article |
Year of Publication | 2015 |
Journal | Journal of Parallel and Distributed Computing |
Volume | 76 |
Pagination | 120-131 |
Date Published | 02/2015 |
Publisher | Elsevier |
DOI | 10.1016/j.jpdc.2014.10.005 |
Scalable heterogeneous CPU-GPU computations for unstructured tetrahedral meshes
IEEE Micro 35, no. 4 (2015): 6-15.Status: Published
Scalable heterogeneous CPU-GPU computations for unstructured tetrahedral meshes
A recent trend in modern high-performance computing environments is the introduction of powerful, energy-efficient hardware accelerators such as GPUs and Xeon Phi coprocessors. These specialized computing devices coexist with CPUs and are optimized for highly parallel applications. In regular computing-intensive applications with predictable data access patterns, these devices often far outperform CPUs and thus relegate the latter to pure control functions instead of computations. For irregular applications, however, the performance gap can be much smaller and is sometimes even reversed. Thus, maximizing the overall performance on heterogeneous systems requires making full use of all available computational resources, including both accelerators and CPUs.
Afilliation | , , Scientific Computing |
Project(s) | Center for Biomedical Computing (SFF) |
Publication Type | Journal Article |
Year of Publication | 2015 |
Journal | IEEE Micro |
Volume | 35 |
Issue | 4 |
Pagination | 6-15 |
Date Published | 07/2015 |
Publisher | ACM IEEE |
DOI | 10.1109/MM.2015.70 |
Proceedings, refereed
CPU+GPU Programming of Stencil Computations for Resource-Efficient Use of GPU Clusters
In IEEE 18th International Conference on Computational Science and Engineering. IEEE Computer Society, 2015.Status: Published
CPU+GPU Programming of Stencil Computations for Resource-Efficient Use of GPU Clusters
On modern GPU clusters, the role of the CPUs is often restricted to controlling the GPUs and handling MPI communication. The unused computing power of the CPUs, however, can be considerable for computations whose performance is bounded by memory traffic. This paper investigates the challenges of simultaneous usage of CPUs and GPUs for computation. Our emphasis is on deriving a heterogeneous CPU+GPU programming approach that combines MPI, OpenMP and CUDA. To effectively hide the overhead of various inter- and intra-node communications, a new level of task parallelism is introduced on top of the conventional data parallelism. Combined with a suitable workload division between the CPUs and GPUs, our CPU+GPU programming approach is able to fully utilize the different processing units. The programming details and achievable performance are exemplified by a widely used 3D 7-point stencil computation, which shows high performance and scaling in experiments using up to 64 CPU-GPU nodes.
Afilliation | Scientific Computing, , |
Project(s) | Center for Biomedical Computing (SFF) |
Publication Type | Proceedings, refereed |
Year of Publication | 2015 |
Conference Name | IEEE 18th International Conference on Computational Science and Engineering |
Pagination | 17-26 |
Date Published | 10/2015 |
Publisher | IEEE Computer Society |
Keywords | CPU+GPU computing, CUDA, GPU, MPI, stencil |
DOI | 10.1109/CSE.2015.33 |
Optimizing Approximate Weighted Matching on Nvidia Kepler K40
In IEEE International Conference on High Performance Computing (HiPC), 2015.Status: Published
Optimizing Approximate Weighted Matching on Nvidia Kepler K40
Matching is a fundamental graph problem with numerous applications in
science and engineering. While algorithms for computing optimal matchings are difficult to parallelize, approximation algorithms on the other hand generally compute high quality solutions and are amenable to parallelization.
In this paper, we present efficient implementations of the current best algorithm for half-approximate weighted matching, the Suitor algorithm, on Nvidia Kepler K-40 platform.
We develop four variants of the algorithm that exploit hardware features to address key challenges for a GPU implementation.
We also experiment with different combinations of work assigned to a warp.
Using an exhaustive set of 269 inputs, we demonstrate that the new implementation outperforms the previous best GPU algorithm by 10x to 100x for over 100 instances, and from 100x to 1000x for 15 instances.
We also demonstrate up to 20x speedup relative to 2 threads, and up to 5x relative to 16 threads on Intel Xeon platform with 16 cores for the same algorithm.
The new algorithms and implementations provided in this paper will have a direct impact on several applications that repeatedly use matching as a key compute kernel. Further, algorithm designs and insights provided in this paper will benefit other researchers implementing graph algorithms on modern GPU architectures.
Afilliation | Scientific Computing, , |
Project(s) | Center for Biomedical Computing (SFF) |
Publication Type | Proceedings, refereed |
Year of Publication | 2015 |
Conference Name | IEEE International Conference on High Performance Computing (HiPC) |
Date Published | 12/2015 |
Towards Detailed Tissue-Scale 3D Simulations of Electrical Activity and Calcium Handling in the Human Cardiac Ventricle
In The 15th International Conference on Algorithms and Architectures for Parallel Processing (ICA3PP 2015). Lecture Notes in Computer Science, Springer Verlag, 2015.Status: Published
Towards Detailed Tissue-Scale 3D Simulations of Electrical Activity and Calcium Handling in the Human Cardiac Ventricle
We adopt a detailed human cardiac cell model, which has 10000 calcium release units, in connection with simulating the electrical activity and calcium handling at the tissue scale. This is a computationally intensive problem requiring a combination of efficient numerical algorithms and parallel programming. To this end, we use a method that is based on binomial distributions to collectively study the stochastic state transitions of the 100 ryanodine receptors inside every calcium release unit, instead of individually following each ryanodine receptor. Moreover, the implementation of the parallel simulator has incorporated optimizations in form of code vectorization and removing redundant calculations. Numerical experiments show very good parallel performance of the 3D simulator and demonstrate that various physiological behaviors are correctly reproduced. This work thus paves way for high-fidelity 3D simulations of human ventricular tissues, with the ultimate goal of understanding the mechanisms of arrhythmia.
Afilliation | Scientific Computing, Scientific Computing, , , Scientific Computing |
Project(s) | Center for Biomedical Computing (SFF) |
Publication Type | Proceedings, refereed |
Year of Publication | 2015 |
Conference Name | The 15th International Conference on Algorithms and Architectures for Parallel Processing (ICA3PP 2015) |
Pagination | 79-92 |
Date Published | 11/2015 |
Publisher | Lecture Notes in Computer Science, Springer Verlag |
ISBN Number | 978-3-319-27136-1 |
Keywords | Calcium handling, multiscale cardiac tissue simulation, supercomputing |
URL | http://link.springer.com/chapter/10.1007/978-3-319-27137-8_7 |
DOI | 10.1007/978-3-319-27137-8_7 |
Proceedings, refereed
Heterogeneous CPU-GPU Computing for the Finite Volume Method on 3D Unstructured Meshes
In 20th International Conference on Parallel and Distributed Systems (ICPADS 2014). IEEE, 2014.Status: Published
Heterogeneous CPU-GPU Computing for the Finite Volume Method on 3D Unstructured Meshes
A recent trend in modern high-performance computing environments is the introduction of accelerators such as GPU and Xeon Phi, i.e. specialized computing devices that are optimized for highly parallel applications and coexist with CPUs. In regular compute-intensive applications with predictable data access patterns, these devices often outperform traditional CPUs by far and thus relegate them to pure control functions instead of computations. For irregular applications however, the gap in relative performance can be much smaller, and sometimes even reversed. Thus, maximizing overall performance in such systems requires that full use of all available computational resources is made. In this paper we study the attainable performance of the cell-centered finite volume method on 3D unstructured tetrahedral meshes using heterogeneous systems consisting of CPUs and multiple GPUs. Finite volume methods are widely used numerical strategies for solving partial differential equations. The advantages of using finite volumes include built-in support for conservation laws and suitability for unstructured meshes. Our focus lies in demonstrating how a workload distribution that maximizes overall performance can be derived from the actual performance attained by the different computing devices in the heterogeneous environment. We also highlight the dual role of partitioning software in reordering and partitioning the input mesh, thus giving rise to a new combined approach to partitioning.
Afilliation | , , Scientific Computing |
Project(s) | Center for Biomedical Computing (SFF) |
Publication Type | Proceedings, refereed |
Year of Publication | 2014 |
Conference Name | 20th International Conference on Parallel and Distributed Systems (ICPADS 2014) |
Pagination | 191-199 |
Publisher | IEEE |
DOI | 10.1109/PADSW.2014.7097808 |
Journal Article
On Parallel Push-Relabel Based Algorithms for Bipartite Maximum Matching
Parallel Computing 40, no. 7 (2014): 289-308.Status: Published
On Parallel Push-Relabel Based Algorithms for Bipartite Maximum Matching
We study multithreaded push-relabel based algorithms for computing maximum cardinality matching in bipartite graphs. Matching is a fundamental combinatorial problem with applications in a wide variety of problems in science and engineering. We are motivated by its use in the context of sparse linear solvers for computing the maximum transversal of a matrix. Other applications can be found in many fields such as bioinformatics (Azad et al., 2010) [4], scheduling (Timmer and Jess, 1995) [27], and chemical structure analysis (John, 1995) [14]. We implement and test our algorithms on several multi-socket multicore systems and compare their performance to state-of-the-art augmenting path-based serial and parallel algorithms using a test set comprised of a wide range of real-world instances. Building on several heuristics for enhancing performance, we demonstrate good scaling for the parallel push-relabel algorithm. We show that it is comparable to the best augmenting path-based algorithms for bipartite matching. To the best of our knowledge, this is the first extensive study of multithreaded push-relabel based algorithms. In addition to a direct impact on the applications using matching, the proposed algorithmic techniques can be extended to preflow-push based algorithms for computing maximum flow in graphs.
Afilliation | , Scientific Computing |
Publication Type | Journal Article |
Year of Publication | 2014 |
Journal | Parallel Computing |
Volume | 40 |
Issue | 7 |
Number | 7 |
Pagination | 289-308 |
Date Published | July |
Publisher | Elsevier |
DOI | 10.1016/j.parco.2014.03.004 |
Proceedings, refereed
On the GPU Performance of Cell-Centered Finite Volume Method Over Unstructured Tetrahedral Meshes
In Proceedings of the 3rd Workshop on Irregular Applications: Architectures and Algorithms. New York: ACM, 2013.Status: Published
On the GPU Performance of Cell-Centered Finite Volume Method Over Unstructured Tetrahedral Meshes
Afilliation | , Scientific Computing |
Publication Type | Proceedings, refereed |
Year of Publication | 2013 |
Conference Name | Proceedings of the 3rd Workshop on Irregular Applications: Architectures and Algorithms |
Publisher | ACM |
Place Published | New York |
DOI | 10.1145/2535753.2535765 |
Journal Article
Push-relabel based algorithms for the maximum transversal problem
Computers & Operations Research 40 (2013): 1266-1275.Status: Published
Push-relabel based algorithms for the maximum transversal problem
Afilliation | , Scientific Computing |
Publication Type | Journal Article |
Year of Publication | 2013 |
Journal | Computers & Operations Research |
Volume | 40 |
Pagination | 1266–1275 |
Date Published | 02/2013 |
Publisher | Elsevier |
URL | http://dx.doi.org/10.1016/j.cor.2012.12.009 |
DOI | 10.1016/j.cor.2012.12.009 |