Publications
Journal Article
Enabling unstructured-mesh computation on massively tiled AI processors: An example of accelerating in silico cardiac simulation
Frontiers in Physics 11 (2023).Status: Published
Enabling unstructured-mesh computation on massively tiled AI processors: An example of accelerating in silico cardiac simulation
A new trend in processor architecture design is the packaging of thousands of small processor cores into a single device, where there is no device-level shared memory but each core has its own local memory. Thus, both the work and data of an application code need to be carefully distributed among the small cores, also termed as tiles. In this paper, we investigate how numerical computations that involve unstructured meshes can be efficiently parallelized and executed on a massively tiled architecture. Graphcore IPUs are chosen as the target hardware platform, to which we port an existing monodomain solver that simulates cardiac electrophysiology over realistic 3D irregular heart geometries. There are two computational kernels in this simulator, where a 3D diffusion equation is discretized over an unstructured mesh and numerically approximated by repeatedly executing sparse matrix-vector multiplications (SpMVs), whereas an individual system of ordinary differential equations (ODEs) is explicitly integrated per mesh cell. We demonstrate how a new style of programming that uses Poplar/C++ can be used to port these commonly encountered computational tasks to Graphcore IPUs. In particular, we describe a per-tile data structure that is adapted to facilitate the inter-tile data exchange needed for parallelizing the SpMVs. We also study the achievable performance of the ODE solver that heavily depends on special mathematical functions, as well as their accuracy on Graphcore IPUs. Moreover, topics related to using multiple IPUs and performance analysis are addressed. In addition to demonstrating an impressive level of performance that can be achieved by IPUs for monodomain simulation, we also provide a discussion on the generic theme of parallelizing and executing unstructured-mesh multiphysics computations on massively tiled hardware.
Afilliation | Scientific Computing |
Project(s) | Department of High Performance Computing , SparCity: An Optimization and Co-design Framework for Sparse Computation |
Publication Type | Journal Article |
Year of Publication | 2023 |
Journal | Frontiers in Physics |
Volume | 11 |
Date Published | 03/2023 |
Publisher | Frontiers |
ISSN | 2296-424X |
Keywords | hardware accelerator, heterogenous computing, irregular meshes, scientific computation, scientific computation on MIMD processors, sparse matrix-vector multiplication (SpMV) |
URL | https://www.frontiersin.org/articles/10.3389/fphy.2023.979699/full |
DOI | 10.3389/fphy.2023.979699 |
Talks, contributed
An Operator-Splitting Approach to Solving Cell-Based Mathematical Models of Cardiac Tissue using Modern CPU Architectures
In SIAM Conference on Parallel Processing for Scientific Computing, 2022.Status: Published
An Operator-Splitting Approach to Solving Cell-Based Mathematical Models of Cardiac Tissue using Modern CPU Architectures
A number of pathologies related to the electrical activity in the heart can be studied using computer simulations of reaction-diffusion models. With the recent extracellular-membrane-intracellular (EMI) model, the geometry of each cell is resolved in the mesh, allowing for a more accurate representation of cardiac tissue on the cell scale. However, the EMI model requires a very fine mesh, and the linear systems arising from the diffusion process in the extracellular and the intracellular domains are ill-conditioned.
In this talk, we present an improved operator-splitting method that decouples the intracellular and extracellular domains, such that each sub-problem becomes a classical elliptic partial differential equation. Using this operator-splitting method, the computing time scales linearly with the problem size. This operator-splitting method enables us to solve the linear systems efficiently on shared-memory parallel computers, and we demonstrate that we are able to solve a system with 512 x 256 cardiac cells, solving linear systems with approximately 250 million degrees of freedom.
Afilliation | Scientific Computing |
Project(s) | Department of Computational Physiology |
Publication Type | Talks, contributed |
Year of Publication | 2022 |
Location of Talk | SIAM Conference on Parallel Processing for Scientific Computing |
Development of a Biventricular Coordinate System with Representation of an Anatomically Detailed Base
In Tampere , 2022.Status: Accepted
Development of a Biventricular Coordinate System with Representation of an Anatomically Detailed Base
Afilliation | Scientific Computing |
Project(s) | Department of Computational Physiology |
Publication Type | Talks, contributed |
Year of Publication | 2022 |
Location of Talk | Tampere |
Type of Talk | Conference |
Book Chapter
Conduction Velocity in Cardiac Tissue as Function of Ion Channel Conductance and Distribution
In Computational Physiology - Simula Summer School 2021 − Student Reports, 41-50. Vol. 12. Cham: Springer International Publishing, 2022.Status: Published
Conduction Velocity in Cardiac Tissue as Function of Ion Channel Conductance and Distribution
Ion channels on the membrane of cardiomyocytes regulate the propagation of action potentials from cell to cell and hence are essential for the proper function of the heart. Through computer simulations with the classical monodomain model for cardiac tissue and the more recent extracellular-membrane-intracellular (EMI) model where individual cells are explicitly represented, we investigated how conduction velocity (CV) in cardiac tissue depends on the strength of various ion currents as well as on the spatial distribution of the ion channels. Our simulations show a sharp decrease in CV when reducing the strength of the sodium (Na+) currents, whereas independent reductions in the potassium (K1 and Kr) and L-type calcium currents have negligible effect on the CV. Furthermore, we find that an increase in number density of Na+ channels towards the cell ends increases the CV, whereas a higher number density of K1 channels slightly reduces the CV. These findings contribute to the understanding of ion channels (e.g. Na+ and K+ channels) in the propagation velocity of action potentials in the heart.
Afilliation | Scientific Computing |
Project(s) | Department of Computational Physiology |
Publication Type | Book Chapter |
Year of Publication | 2022 |
Book Title | Computational Physiology - Simula Summer School 2021 − Student Reports |
Volume | 12 |
Chapter | 4 |
Pagination | 41 - 50 |
Date Published | 05/2022 |
Publisher | Springer International Publishing |
Place Published | Cham |
ISBN Number | 978-3-031-05163-0 |
ISBN | 2512-1677 |
Keywords | conduction velocity, EMI model, ion channels |
URL | https://link.springer.com/chapter/10.1007/978-3-031-05164-7_4 |
DOI | 10.1007/978-3-031-05164-7_4 |
Journal Article
Resource-efficient use of modern processor architectures for numerically solving cardiac ionic cell models
Frontiers in Physiology 13 (2022).Status: Published
Resource-efficient use of modern processor architectures for numerically solving cardiac ionic cell models
A central component in simulating cardiac electrophysiology is the numerical solution of nonlinear ordinary differential equations, also called cardiac ionic cell models, that describe cross-cell-membrane ion transport. Biophysically detailed cell models often require a considerable amount of computation, including calls to special mathematical functions. This paper systematically studies how to efficiently use modern multicore CPUs for this costly computational task. We start by investigating the code restructurings needed to effectively enable compiler- supported SIMD vectorisation, which is the most important performance booster in this context. It is found that suitable OpenMP directives are sufficient for achieving both vectorisation and parallelisation. We then continue with an evaluation of the performance optimisation technique of using lookup tables. Due to increased challenges for automated vectorisation, the obtainable benefits of lookup tables are dependent on the hardware platforms chosen. Throughout the study, we report detailed time measurements obtained on Intel Xeon, Xeon Phi, AMD Epyc and two ARM processors including Fujitsu A64FX, while attention is also paid to the impact of SIMD vectorisation and lookup tables on the computational accuracy. As a realistic example, the benefits of performance enhancement are demonstrated by a 10^9-run ensemble on the OakForest-PACS system, where code restructurings and SIMD vectorisation yield an 84% reduction in computing time, corresponding to 63,270 node hours.
Afilliation | Scientific Computing |
Project(s) | Department of Computational Physiology, MicroCard: Numerical modeling of cardiac electrophysiology at the cellular scale |
Publication Type | Journal Article |
Year of Publication | 2022 |
Journal | Frontiers in Physiology |
Volume | 13 |
Date Published | 06/2022 |
Publisher | Frontiers |
ISSN | 1664-042X |
URL | https://www.frontiersin.org/article/10.3389/fphys.2022.904648 |
DOI | 10.3389/fphys.2022.904648 |
Journal Article
Efficient numerical solution of the EMI model representing the extracellular space (E), cell membrane (M) and intracellular space (I) of a collection of cardiac cells
Frontiers in Physics 8 (2021): 579461.Status: Published
Efficient numerical solution of the EMI model representing the extracellular space (E), cell membrane (M) and intracellular space (I) of a collection of cardiac cells
The EMI model represents excitable cells in a more accurate manner than traditional homogenized models at the price of increased computational complexity. The increased complexity of solving the EMI model stems from a significant increase in the number of computational nodes and from the form of the linear systems that need to be solved. Here, we will show that the latter problem can be solved by careful use of operator splitting of the spatially coupled equations. By using this method, the linear systems can be broken into sub-problems that are of the classical type of linear, elliptic boundary value problems. Therefore, the vast collection of methods for solving linear, elliptic partial differential equations can be used. We demonstrate that this enables us to solve the systems using shared-memory parallel computers. The computing time scales perfectly with the number of physical cells. For a collection of 512×256 cells, we manage to solve linear systems with about 2.5×10^8 unknows. Since the computational effort scales linearly with the number of physical cells, we believe that larger computers can be used to simulate millions of excitable cells and thus allow careful analysis of physiological systems of great importance.
Afilliation | Scientific Computing |
Project(s) | Department of Computational Physiology, Department of High Performance Computing |
Publication Type | Journal Article |
Year of Publication | 2021 |
Journal | Frontiers in Physics |
Volume | 8 |
Pagination | 579461 |
Publisher | Frontiers |
URL | https://www.frontiersin.org/articles/10.3389/fphy.2020.579461/full |
DOI | 10.3389/fphy.2020.579461 |
Book Chapter
Operator Splitting and Finite Difference Schemes for Solving the EMI Model
In Modeling Excitable Tissue: The EMI Framework, 44-55. Vol. 7. Cham: Springer International Publishing, 2021.Status: Published
Operator Splitting and Finite Difference Schemes for Solving the EMI Model
We want to be able to perform accurate simulations of a large number of cardiac cells based on mathematical models where each individual cell is represented in the model. This implies that the computational mesh has to have a typical resolution of a few µm leading to huge computational challenges. In this paper we use a certain operator splitting of the coupled equations and showthat this leads to systems that can be solved in parallel. This opens up for the possibility of simulating large numbers of coupled cardiac cells.
Afilliation | Scientific Computing |
Project(s) | Department of Computational Physiology, Department of High Performance Computing |
Publication Type | Book Chapter |
Year of Publication | 2021 |
Book Title | Modeling Excitable Tissue: The EMI Framework |
Volume | 7 |
Chapter | 4 |
Pagination | 44 - 55 |
Publisher | Springer International Publishing |
Place Published | Cham |
ISBN Number | 978-3-030-61156-9 |
ISBN | 2512-1677 |
URL | http://link.springer.com/content/pdf/10.1007/978-3-030-61157-6_4 |
DOI | 10.1007/978-3-030-61157-6_4 |
Poster
Efficient simulations of patient-specific electrical heart activity on the DGX-2
GPU Technology Conference (GTC) 2020, Silicon Valley, USA: Nvidia, 2020.Status: Published
Efficient simulations of patient-specific electrical heart activity on the DGX-2
Patients who have suffered a heart attack have an elevated risk of developing arrhythmia. The use of computer simulations of the electrical activity in the hearts of these patients, is emerging as an alternative to traditional, more invasive examinations performed by doctors today. Recent advances in personalised arrhythmia risk prediction show that computational models can provide not only safer but also more accurate results than invasive procedures. However, biophysically accurate simulations of the electrical activity in the heart require solving linear systems over fine meshes and time resolutions, which can take hours or even days. This limits the use of such simulations in the clinic where diagnosis and treatment planning can be time sensitive, even if it is just for the reason of operation schedules. Furthermore, the non-interactive, non-intuitive way of accessing simulations and their results makes it hard to study these collaboratively. Overcoming these limitations requires speeding up computations from hours to seconds, which requires a massive increase in computational capabilities.
We have developed a code that is capable of performing highly efficient heart simulations on the DGX-2, making use of all 16 V100 GPUs. Using a patient-specific unstructured tetrahedral mesh with 11.7 million cells, we are able to simulate the electrical heart activity at 1/30 of real-time. Moreover, we are able to show that the throughput achieved using all 16 GPUs in the DGX-2 is 77.6% of the theoretical maximum.
We achieved this through extensive optimisations of the two kernels constituting the body of the main loop in the simulator. In the kernel solving the diffusion equation (governing the spread of the electrical signal), constituting of a sparse matrix-vector multiplication, we minimise the memory traffic by reordering the mesh (and matrix) elements into clusters that fit in the V100's L2 cache. In the kernel solving the cell model (describing the complex interactions of ion channels in the cell membrane), we apply sophisticated domain-specific optimisations to reduce the number of floating point operations to the point where the kernel becomes memory bound. After optimisation, both kernels are memory bound, and we derive the minimum memory traffic, which we then divide by the aggregate memory bandwidth to obtain a lower bound on the execution time.
Topics discussed include optimisations for sparse matrix-vector multiplications, strategies for handling inter-device communication for unstructured meshes, and lessons we learnt while programming the DGX-2.
Afilliation | Scientific Computing |
Project(s) | Department of Computational Physiology, Department of High Performance Computing |
Publication Type | Poster |
Year of Publication | 2020 |
Date Published | 03/2020 |
Publisher | Nvidia |
Place Published | GPU Technology Conference (GTC) 2020, Silicon Valley, USA |
Master's thesis
Solving the monodomain model efficiently on GPUs
In The University of Oslo. Department of Informatics, University of Oslo, 2019.Status: Published
Solving the monodomain model efficiently on GPUs
Patients who have suffered a myocardial infarction have an elevated risk of developing arrhythmia. The use of in silico experiments of the electrical activity in the hearts of these patients, is emerging as an alternative to traditional, more invasive in situ examinations. One of the principal barriers to the use of in silico experiments is the tremendous amount of computational power required to perform such simulations.
Building on an existing code, we create a complete solver for the monodomain model, which describes the electrical activity in the heart. Through extensive optimisations, we manage to efficiently utilise an NVIDIA DGX-2 machine, which is currently the most powerful single-box general-purpose computer with its 16 V100 GPUs.
With this solver, we achieve simulation speeds of 2 heartbeats per wall clock minute on the DGX-2 using a realistic unstructured tetrahedral mesh with 11.7 million cells, and we show that the achieved execution time using all 16 GPUs in the DGX-2 is only 30.2% higher than the theoretical lower bound.
Afilliation | Scientific Computing |
Project(s) | Department of High Performance Computing , Department of Computational Physiology |
Publication Type | Master's thesis |
Year of Publication | 2019 |
Degree awarding institution | The University of Oslo |
Pagination | 117 |
Date Published | 09/2019 |
Publisher | Department of Informatics, University of Oslo |
Keywords | CUDA, electrocardiology, GPU, heterogeneous computing, High-performance computing, monodomain model |
URL | http://urn.nb.no/URN:NBN:no-74080 |
Poster
Towards Detailed Real-Time Simulations of Cardiac Arrhythmia
International Conference in Computing in Cardiology, Singapore, 2019.Status: Published
Towards Detailed Real-Time Simulations of Cardiac Arrhythmia
Recent advances in personalized arrhythmia risk prediction show that computational models can provide not only safer but also more accurate results than invasive procedures. However, biophysically accurate simulations require solving linear systems over fine meshes and time resolutions, which can take hours or even days. This limits the use of such simulations in the clinic where diagnosis and treatment planning can be time sensitive, even if it is just for the reason of operation schedules. Furthermore, the non-interactive, non-intuitive way of accessing simulations and their results makes it hard to study these collaboratively.
Overcoming these limitations requires speeding up computations from hours to seconds, which requires a massive increase in computational capabilities.
Fortunately, the cost of computing has fallen dramatically in the past decade. A prominent reason for this is the recent introduction of manycore processors such as GPUs, which by now power the majority of the world’s leading supercomputers. These devices owe their success to the fact that they are optimized for massively parallel workloads, such as applying similar ODE kernel computations to millions of mesh elements in scientific computing applications. Unlike CPUs, which are typically optimized for sequential performance, this allows GPU architectures to dedicate more transistors to performing computations, thereby increasing parallel speed and energy efficiency.
In this poster, we present ongoing work on the parallelization of finite volume computations over an unstructured mesh as well as the challenges involved in building scalable simulation codes and discuss the steps needed to close the gap to accurate real-time computations.
Afilliation | Scientific Computing |
Project(s) | Meeting Exascale Computing with Source-to-Source Compilers, Department of High Performance Computing |
Publication Type | Poster |
Year of Publication | 2019 |
Date Published | 09/2019 |
Place Published | International Conference in Computing in Cardiology, Singapore |