Publications
Journal Article
A cell-based framework for modeling cardiac mechanics
Biomechanics and Modeling in Mechanobiology 10101010, no. 1137/11115/11109/101145/779359 (2023).Status: Published
A cell-based framework for modeling cardiac mechanics
Afilliation | Scientific Computing |
Project(s) | Department of Computational Physiology |
Publication Type | Journal Article |
Year of Publication | 2023 |
Journal | Biomechanics and Modeling in Mechanobiology |
Volume | 10101010 |
Issue | 1137/11115/11109/101145/779359 |
Date Published | 01/2023 |
Publisher | Springer |
ISSN | 1617-7959 |
Keywords | Cardiac Mechanics, cardiomyocyte contraction, cell geometries, intracellular and extracellular mechanics, microscale modeling |
URL | https://link.springer.com/article/10.1007/s10237-022-01660-8 |
DOI | 10.1007/s10237-022-01660-8 |
Talks, contributed
Modeling cardiac mechanics using a cell-based framework
In 15th World Congress on Computational Mechanics (WCCM-XV), Yokohama, Japan. 15th World Congress on Computational Mechanics (WCCM-XV), 2022.Status: Published
Modeling cardiac mechanics using a cell-based framework
Cardiac tissue primarily consists of interconnected cardiac cells which contract in a synchronized manner as the heart beats. Most computational models of cardiac tissue, however, homogenize out the individual cells and their surroundings. This approach has been immensely useful for describing cardiac mechanics on an overall level, but gives very limited understanding of the interaction between individual cells and their intermediate surroundings. Several models have been developed for single cells, see e.g. [1, 2]. In this work, we extend the mechanical part of these frameworks to a domain representing multiple cells, allowing us to investigate cell-matrix and cell-cell interactions. We present a mechanical model in which each cell and the extracellular matrix have an explicit geometrical representation, similar to the electrophysiological model presented in [3]. The strain energy functions are defined separately for each of the intracellular and extracellular subdomains, while we assume continuity of displacement and stresses along the membrane. Active tension is only assigned to the intracellular subdomain. For each state, we find an equilibrium solution using the finite element method. We explore passive and active mechanics for a single cell surrounded by an extracellular matrix and for small collections of cells combined into tissue blocks. The explicit geometric representation gives rise to highly varying strain and stress patterns. We show that the extracellular matrix stiffness highly influences the cardiomyocyte stresses during contraction. Through large-scale simulations enabled by high-performance computing, we also demonstrate that our model can be scaled to small collections of cells, resembling small cardiac tissue samples.
[1] Tracqui, T. and Ohayon, J. An integrated formulation of anisotropic force–calcium relations driving spatio-temporal contractions of cardiac myocytes. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences (2009).
[2] Ruiz-Baier, R. Gizzi, A., Rossi, S. Cherubini, C. Laadhari, A. Filippi, S. and Quarteroni, A. Mathematical modelling of active contraction in isolated cardiomyocytes. Mathematical Medicine and Biology (2014).
[3] Tveito, A., Jæger, KH. Kuchta, M. Mardal, K-A. and Rognes, ME. A cell-based framework for numerical modeling of electrical conduction in cardiac tissue. Frontiers in Physics (2017).
\end{thebibliography}
Afilliation | Scientific Computing |
Project(s) | Department of Computational Physiology |
Publication Type | Talks, contributed |
Year of Publication | 2022 |
Location of Talk | 15th World Congress on Computational Mechanics (WCCM-XV), Yokohama, Japan |
Publisher | 15th World Congress on Computational Mechanics (WCCM-XV) |
Type of Talk | Contributed |
Keywords | cardiomyocyte contraction, cell-based geometries, intracellular and extracellular mechanics, microscale cardiac mechanics |
URL | https://prezi.com/view/uGIK0kQvrZ6G1CNOkc73/ |
Journal Article
On memory traffic and optimisations for low-order finite element assembly algorithms on multi-core CPUs
ACM Transactions on Mathematical Software 48, no. 2 (2022): 1-31.Status: Published
On memory traffic and optimisations for low-order finite element assembly algorithms on multi-core CPUs
Motivated by the wish to understand the achievable performance of finite element assembly on unstructured computational meshes, we dissect the standard cellwise assembly algorithm into four kernels, two of which are dominated by irregular memory traffic. Several optimisation schemes are studied together with associated lower and upper bounds on the estimated memory traffic volume. Apart from properly reordering the mesh entities, the two most significant optimisations include adopting a lookup table in adding element matrices or vectors to their global counterparts, and using a row-wise assembly algorithm for multi-threaded parallelisation. Rigorous benchmarking shows that, due to the various optimisations, the actual volumes of memory traffic are in many cases very close to the estimated lower bounds. These results confirm the effectiveness of the optimisations, while also providing a recipe for developing efficient software for finite element assembly.
Afilliation | Scientific Computing |
Project(s) | Meeting Exascale Computing with Source-to-Source Compilers, Department of High Performance Computing |
Publication Type | Journal Article |
Year of Publication | 2022 |
Journal | ACM Transactions on Mathematical Software |
Volume | 48 |
Issue | 2 |
Number | 19 |
Pagination | 1–31 |
Date Published | 05/2022 |
Publisher | Association for Computing Machinery (ACM) |
ISSN | 0098-3500 |
DOI | 10.1145/3503925 |
Poster
Automated Code Generation for GPU-Based Finite Element Computations in FEniCS
SIAM Conference on Computational Science and Engineering (CSE21): SIAM, 2021.Status: Published
Automated Code Generation for GPU-Based Finite Element Computations in FEniCS
Developing high-performance finite element codes normally requires hand-crafting and fine tuning of computational kernels, which is not an easy task to carry out for each and every problem. Automated code generation has proved to be a highly productive alternative for frameworks like FEniCS, where a compiler is used to automatically generate suitable kernels from high-level mathematical descriptions of finite element problems. This strategy has so far enabled users to develop and run a variety of high-performance finite element solvers on clusters of multicore CPUs. We have recently enhanced FEniCS with GPU acceleration by enabling its internal compiler to generate CUDA kernels that are needed to offload finite element calculations to GPUs, particularly the assembly of linear systems. This poster presents the results of GPU-accelerating FEniCS and explores performance characteristics of auto-generated CUDA kernels and GPU-based assembly of linear systems for finite element methods.
Afilliation | Scientific Computing |
Project(s) | Meeting Exascale Computing with Source-to-Source Compilers |
Publication Type | Poster |
Year of Publication | 2021 |
Date Published | 03/2021 |
Publisher | SIAM |
Place Published | SIAM Conference on Computational Science and Engineering (CSE21) |
PhD Thesis
High-performance finite element computations: Performance modelling, optimisation, GPU acceleration & automated code generation
In University of Oslo. Vol. PhD. Oslo, Norway: University of Oslo, 2021.Status: Published
High-performance finite element computations: Performance modelling, optimisation, GPU acceleration & automated code generation
Computer experiments have become a valuable tool for investigating various physical and biological processes described by partial differential equations (PDEs), such as weather forecasting or modelling the mechanical behaviour of cardiac tissue. Finite element methods are a class of numerical methods for solving PDEs that are often preferred, but these methods are rather difficult to implement correctly, let alone efficiently.
This thesis investigates the performance of several key computational kernels involved in finite element methods. First, a performance model is developed to better understand sparse matrix-vector multiplication, which is central to solving linear systems of equations that arise during finite element calculations. Second, the process of assembling linear systems is considered through careful benchmarking and analysis of the memory traffic involved. This results in clear guidelines for finite element assembly on shared-memory multicore CPUs.
Finally, hardware accelerators are incorporated by extending the FEniCS PDE solver framework to carry out assembly and solution of linear systems on a graphics processing unit (GPU). Example problems show that GPU-accelerated finite element solvers can exhibit substantial speedup over optimised multicore CPU codes. Moreover, the use of automated code generation makes these techniques much more accessible to domain scientists and non-experts.
Afilliation | Scientific Computing |
Project(s) | Meeting Exascale Computing with Source-to-Source Compilers |
Publication Type | PhD Thesis |
Year of Publication | 2021 |
Degree awarding institution | University of Oslo |
Degree | PhD |
Number of Pages | 132 |
Date Published | 09/2020 |
Publisher | University of Oslo |
Place Published | Oslo, Norway |
Other Numbers | ISSN 1501-7710 |
Journal Article
Cache simulation for irregular memory traffic on multi-core CPUs: Case study on performance models for sparse matrix–vector multiplication
Journal of Parallel and Distributed Computing 144 (2020): 189-205.Status: Published
Cache simulation for irregular memory traffic on multi-core CPUs: Case study on performance models for sparse matrix–vector multiplication
Parallel computations with irregular memory access patterns are often limited by the memory subsystems of multi-core CPUs, though it can be difficult to pinpoint and quantify performance bottlenecks precisely. We present a method for estimating volumes of data traffic caused by irregular, parallel computations on multi-core CPUs with memory hierarchies containing both private and shared caches. Further, we describe a performance model based on these estimates that applies to bandwidth-limited computations. As a case study, we consider two standard algorithms for sparse matrix–vector multiplication, a widely used, irregular kernel. Using three different multi-core CPU systems and a set of matrices that induce a range of irregular memory access patterns, we demonstrate that our cache simulation combined with the proposed performance model accurately quantifies performance bottlenecks that would not be detected using standard best- or worst-case estimates of the data traffic volume.
Afilliation | Scientific Computing |
Project(s) | Meeting Exascale Computing with Source-to-Source Compilers, Department of High Performance Computing |
Publication Type | Journal Article |
Year of Publication | 2020 |
Journal | Journal of Parallel and Distributed Computing |
Volume | 144 |
Pagination | 189--205 |
Date Published | 06/2020 |
Publisher | Elsevier |
ISSN | 0743-7315 |
Keywords | AMD Epyc, Cache simulation, Intel Xeon, Performance model, Sparse matrix–vector multiplication |
URL | http://www.sciencedirect.com/science/article/pii/S0743731520302999 |
DOI | 10.1016/j.jpdc.2020.05.020 |
Talks, contributed
Compiling finite element variational forms for GPU-based assembly
In FEniCS‘19, Washington DC, USA, 2019.Status: Published
Compiling finite element variational forms for GPU-based assembly
We present an experimental form compiler for exploring GPU-based algorithms for assembling vectors, matrices, and higher-order tensors from finite element variational forms.
Previous studies by Cecka et al. (2010), Markall et al. (2013), and Reguly and Giles (2015) have explored different strategies for using GPUs for finite element assembly, demonstrating the potential rewards and highlighting some of the difficulties in offloading assembly to a GPU. Even though these studies are limited to a few specific cases, mostly related to the Poisson problem, they already indicate that to achieve high performance, the appropriate assembly strategy depends on the problem at hand and the chosen discretisation.
By using a form compiler to automatically generate code for GPU-based assembly, we can explore a range of problems based on different variational forms and finite element discretisations. In this way, we aim to get a better picture of the potential benefits and challenges of assembling finite element variational forms on a GPU. Ultimately, the goal is to explore algorithms based on automated code generation that offload entire finite element methods to a GPU, including assembly of vectors and matrices and solution of linear systems.
In this talk, we give an exact characterisation of the class of finite element variational forms supported by our compiler, comprising a small subset of the Unified Form Language that is used by FEniCS and Firedrake. Furthermore, we describe a denotational semantics that explains how expressions in the form language are translated to low-level C or CUDA code for performing assembly over a computational mesh. We also present some initial results and discuss the performance of the generated code.
Afilliation | Scientific Computing |
Project(s) | Meeting Exascale Computing with Source-to-Source Compilers, Department of High Performance Computing , Department of Numerical Analysis and Scientific Computing |
Publication Type | Talks, contributed |
Year of Publication | 2019 |
Location of Talk | FEniCS‘19, Washington DC, USA |
Keywords | Code translation, GPU, HPC |
Poster
Quantifying data traffic of sparse matrix-vector multiplication in a multi-level memory hierarchy
London, UK, 2018.Status: Published
Quantifying data traffic of sparse matrix-vector multiplication in a multi-level memory hierarchy
Sparse matrix-vector multiplication (SpMV) is the central operation in an iterative linear solver. On a computer with a multi-level memory hierarchy, SpMV performance is limited by memory or cache bandwidth. Furthermore, for a given sparse matrix, the volume of data traffic depends on the location of the matrix non-zeros. By estimating the volume of data traffic with Aho, Denning and Ullman’s page replacement model [1], we can locate bottlenecks in the memory hierarchy and evaluate optimizations such as matrix reordering. The model is evaluated by comparing with measurements from hardware performance counters on Intel Sandy Bridge.
[1]: Alfred V. Aho, Peter J. Denning, and Jeffrey D. Ullman. 1971. Principles of Optimal Page Replacement. J. ACM 18, 1 (January 1971), pp. 80-93.
Afilliation | Scientific Computing |
Project(s) | Meeting Exascale Computing with Source-to-Source Compilers |
Publication Type | Poster |
Year of Publication | 2018 |
Date Published | 06/2018 |
Place Published | London, UK |