Publications
Journal Article
On memory traffic and optimisations for low-order finite element assembly algorithms on multi-core CPUs
ACM Trans. Math. Software (2022).Status: Accepted
On memory traffic and optimisations for low-order finite element assembly algorithms on multi-core CPUs
Afilliation | Scientific Computing |
Project(s) | Meeting Exascale Computing with Source-to-Source Compilers, Department of High Performance Computing |
Publication Type | Journal Article |
Year of Publication | 2022 |
Journal | ACM Trans. Math. Software |
Publisher | Association for Computing Machinery (ACM) |
DOI | 10.1145/3503925 |
Poster
Automated Code Generation for GPU-Based Finite Element Computations in FEniCS
SIAM Conference on Computational Science and Engineering (CSE21): SIAM, 2021.Status: Published
Automated Code Generation for GPU-Based Finite Element Computations in FEniCS
Developing high-performance finite element codes normally requires hand-crafting and fine tuning of computational kernels, which is not an easy task to carry out for each and every problem. Automated code generation has proved to be a highly productive alternative for frameworks like FEniCS, where a compiler is used to automatically generate suitable kernels from high-level mathematical descriptions of finite element problems. This strategy has so far enabled users to develop and run a variety of high-performance finite element solvers on clusters of multicore CPUs. We have recently enhanced FEniCS with GPU acceleration by enabling its internal compiler to generate CUDA kernels that are needed to offload finite element calculations to GPUs, particularly the assembly of linear systems. This poster presents the results of GPU-accelerating FEniCS and explores performance characteristics of auto-generated CUDA kernels and GPU-based assembly of linear systems for finite element methods.
Afilliation | Scientific Computing |
Project(s) | Meeting Exascale Computing with Source-to-Source Compilers |
Publication Type | Poster |
Year of Publication | 2021 |
Date Published | 03/2021 |
Publisher | SIAM |
Place Published | SIAM Conference on Computational Science and Engineering (CSE21) |
Journal Article
Efficient numerical solution of the EMI model representing the extracellular space (E), cell membrane (M) and intracellular space (I) of a collection of cardiac cells
Frontiers in Physics 8 (2021): 579461.Status: Published
Efficient numerical solution of the EMI model representing the extracellular space (E), cell membrane (M) and intracellular space (I) of a collection of cardiac cells
The EMI model represents excitable cells in a more accurate manner than traditional homogenized models at the price of increased computational complexity. The increased complexity of solving the EMI model stems from a significant increase in the number of computational nodes and from the form of the linear systems that need to be solved. Here, we will show that the latter problem can be solved by careful use of operator splitting of the spatially coupled equations. By using this method, the linear systems can be broken into sub-problems that are of the classical type of linear, elliptic boundary value problems. Therefore, the vast collection of methods for solving linear, elliptic partial differential equations can be used. We demonstrate that this enables us to solve the systems using shared-memory parallel computers. The computing time scales perfectly with the number of physical cells. For a collection of 512×256 cells, we manage to solve linear systems with about 2.5×10^8 unknows. Since the computational effort scales linearly with the number of physical cells, we believe that larger computers can be used to simulate millions of excitable cells and thus allow careful analysis of physiological systems of great importance.
Afilliation | Scientific Computing |
Project(s) | Department of Computational Physiology, Department of High Performance Computing |
Publication Type | Journal Article |
Year of Publication | 2021 |
Journal | Frontiers in Physics |
Volume | 8 |
Pagination | 579461 |
Publisher | Frontiers |
URL | https://www.frontiersin.org/articles/10.3389/fphy.2020.579461/full |
DOI | 10.3389/fphy.2020.579461 |
On the impact of heterogeneity-aware mesh partitioning and non-contributing computation removal on parallel reservoir simulations
Journal of Mathematics in Industry 11 (2021).Status: Published
On the impact of heterogeneity-aware mesh partitioning and non-contributing computation removal on parallel reservoir simulations
Parallel computations have become standard practice for simulating the complicated multi-phase flow in a petroleum reservoir. Increasingly sophisticated numerical techniques have been developed in this context. During the chase of algorithmic superiority, however, there is a risk of forgetting the ultimate goal, namely, to efficiently simulate real-world reservoirs on realistic parallel hardware platforms. In this paper, we quantitatively analyse the negative performance impact caused by non-contributing computations that are associated with the “ghost computational cells” per subdomain, which is an insufficiently studied subject in parallel reservoir simulation. We also show how these non-contributing computations can be avoided by reordering the computational cells of each subdomain, such that the ghost cells are grouped together. Moreover, we propose a new graph-edge weighting scheme that can improve the mesh partitioning quality, aiming at a balance between handling the heterogeneity of geological properties and restricting the communication overhead. To put the study in a realistic setting, we enhance the open-source Flow simulator from the OPM framework, and provide comparisons with industrial-standard simulators for real-world reservoir models.
Afilliation | Scientific Computing |
Project(s) | Department of High Performance Computing |
Publication Type | Journal Article |
Year of Publication | 2021 |
Journal | Journal of Mathematics in Industry |
Volume | 11 |
Date Published | 06/2021 |
Publisher | Springer |
URL | https://mathematicsinindustry.springeropen.com/articles/10.1186/s13362-0... |
DOI | 10.1186/s13362-021-00108-5 |
Proceedings, refereed
iPUG for multiple Graphcore IPUs: Optimizing performance and scalability of parallel breadth-first search
In 28th IEEE International Conference on High Performance Computing, Data, & Analytics (HiPC). Bangalore, India: IEEE, 2021.Status: Published
iPUG for multiple Graphcore IPUs: Optimizing performance and scalability of parallel breadth-first search
Parallel graph algorithms have become one of the principal applications of high-performance computing besides numerical simulations and machine learning workloads. However, due to their highly unstructured nature, graph algorithms remain extremely challenging for most parallel systems, with large gaps between observed performance and theoretical limits. Further-more, most mainstream architectures rely heavily on single instruction multiple data (SIMD) processing for high floating-point rates, which is not beneficial for graph processing which instead requires high memory bandwidth, low memory latency, and efficient processing of unstructured data. On the other hand, we are currently observing an explosion of new hardware architectures, many of which are adapted to specific purposes and diverge from traditional designs. A notable example is the Graphcore Intelligence Processing Unit (IPU), which is developed to meet the needs of upcoming machine intelligence applications. Its design eschews the traditional cache hierarchy, relying on SRAM as its main memory instead. The result is an extremely high-bandwidth, low-latency memory at the cost of capacity. In addition, the IPU consists of a large number of independent cores, allowing for true multiple instruction multiple data (MIMD) processing. Together, these features suggest that such a processor is well suited for graph processing. We test the limits of graph processing on multiple IPUs by implementing a low-level, high-performance code for breadth-first search (BFS), following the specifications of Graph500, the most widely used benchmark for parallel graph processing. Despite the simplicity of the BFS algorithm, implementing efficient parallel codes for it has proven to be a challenging task in the past. We show that our implementation scales well on a system with 8 IPUs and attains roughly twice the performance of an equal number of NVIDIA V100 GPUs using state-of-the-art CUDA code.
Afilliation | Scientific Computing |
Project(s) | Department of High Performance Computing , SparCity: An Optimization and Co-design Framework for Sparse Computation |
Publication Type | Proceedings, refereed |
Year of Publication | 2021 |
Conference Name | 28th IEEE International Conference on High Performance Computing, Data, & Analytics (HiPC) |
Pagination | 162-171 |
Date Published | 12/2021 |
Publisher | IEEE |
Place Published | Bangalore, India |
DOI | 10.1109/HiPC53243.2021.00030 |
Book Chapter
Operator Splitting and Finite Difference Schemes for Solving the EMI Model
In Modeling Excitable Tissue: The EMI Framework, 44-55. Vol. 7. Cham: Springer International Publishing, 2021.Status: Published
Operator Splitting and Finite Difference Schemes for Solving the EMI Model
We want to be able to perform accurate simulations of a large number of cardiac cells based on mathematical models where each individual cell is represented in the model. This implies that the computational mesh has to have a typical resolution of a few µm leading to huge computational challenges. In this paper we use a certain operator splitting of the coupled equations and showthat this leads to systems that can be solved in parallel. This opens up for the possibility of simulating large numbers of coupled cardiac cells.
Afilliation | Scientific Computing |
Project(s) | Department of Computational Physiology, Department of High Performance Computing |
Publication Type | Book Chapter |
Year of Publication | 2021 |
Book Title | Modeling Excitable Tissue: The EMI Framework |
Volume | 7 |
Chapter | 4 |
Pagination | 44 - 55 |
Publisher | Springer International Publishing |
Place Published | Cham |
ISBN Number | 978-3-030-61156-9 |
ISBN | 2512-1677 |
URL | http://link.springer.com/content/pdf/10.1007/978-3-030-61157-6_4 |
DOI | 10.1007/978-3-030-61157-6_4 |
Journal Article
Cache simulation for irregular memory traffic on multi-core CPUs: Case study on performance models for sparse matrix–vector multiplication
Journal of Parallel and Distributed Computing 144 (2020): 189-205.Status: Published
Cache simulation for irregular memory traffic on multi-core CPUs: Case study on performance models for sparse matrix–vector multiplication
Parallel computations with irregular memory access patterns are often limited by the memory subsystems of multi-core CPUs, though it can be difficult to pinpoint and quantify performance bottlenecks precisely. We present a method for estimating volumes of data traffic caused by irregular, parallel computations on multi-core CPUs with memory hierarchies containing both private and shared caches. Further, we describe a performance model based on these estimates that applies to bandwidth-limited computations. As a case study, we consider two standard algorithms for sparse matrix–vector multiplication, a widely used, irregular kernel. Using three different multi-core CPU systems and a set of matrices that induce a range of irregular memory access patterns, we demonstrate that our cache simulation combined with the proposed performance model accurately quantifies performance bottlenecks that would not be detected using standard best- or worst-case estimates of the data traffic volume.
Afilliation | Scientific Computing |
Project(s) | Meeting Exascale Computing with Source-to-Source Compilers, Department of High Performance Computing |
Publication Type | Journal Article |
Year of Publication | 2020 |
Journal | Journal of Parallel and Distributed Computing |
Volume | 144 |
Pagination | 189--205 |
Date Published | 06/2020 |
Publisher | Elsevier |
ISSN | 0743-7315 |
Keywords | AMD Epyc, Cache simulation, Intel Xeon, Performance model, Sparse matrix–vector multiplication |
URL | http://www.sciencedirect.com/science/article/pii/S0743731520302999 |
DOI | 10.1016/j.jpdc.2020.05.020 |
Poster
Efficient simulations of patient-specific electrical heart activity on the DGX-2
GPU Technology Conference (GTC) 2020, Silicon Valley, USA: Nvidia, 2020.Status: Published
Efficient simulations of patient-specific electrical heart activity on the DGX-2
Patients who have suffered a heart attack have an elevated risk of developing arrhythmia. The use of computer simulations of the electrical activity in the hearts of these patients, is emerging as an alternative to traditional, more invasive examinations performed by doctors today. Recent advances in personalised arrhythmia risk prediction show that computational models can provide not only safer but also more accurate results than invasive procedures. However, biophysically accurate simulations of the electrical activity in the heart require solving linear systems over fine meshes and time resolutions, which can take hours or even days. This limits the use of such simulations in the clinic where diagnosis and treatment planning can be time sensitive, even if it is just for the reason of operation schedules. Furthermore, the non-interactive, non-intuitive way of accessing simulations and their results makes it hard to study these collaboratively. Overcoming these limitations requires speeding up computations from hours to seconds, which requires a massive increase in computational capabilities.
We have developed a code that is capable of performing highly efficient heart simulations on the DGX-2, making use of all 16 V100 GPUs. Using a patient-specific unstructured tetrahedral mesh with 11.7 million cells, we are able to simulate the electrical heart activity at 1/30 of real-time. Moreover, we are able to show that the throughput achieved using all 16 GPUs in the DGX-2 is 77.6% of the theoretical maximum.
We achieved this through extensive optimisations of the two kernels constituting the body of the main loop in the simulator. In the kernel solving the diffusion equation (governing the spread of the electrical signal), constituting of a sparse matrix-vector multiplication, we minimise the memory traffic by reordering the mesh (and matrix) elements into clusters that fit in the V100's L2 cache. In the kernel solving the cell model (describing the complex interactions of ion channels in the cell membrane), we apply sophisticated domain-specific optimisations to reduce the number of floating point operations to the point where the kernel becomes memory bound. After optimisation, both kernels are memory bound, and we derive the minimum memory traffic, which we then divide by the aggregate memory bandwidth to obtain a lower bound on the execution time.
Topics discussed include optimisations for sparse matrix-vector multiplications, strategies for handling inter-device communication for unstructured meshes, and lessons we learnt while programming the DGX-2.
Afilliation | Scientific Computing |
Project(s) | Department of Computational Physiology, Department of High Performance Computing |
Publication Type | Poster |
Year of Publication | 2020 |
Date Published | 03/2020 |
Publisher | Nvidia |
Place Published | GPU Technology Conference (GTC) 2020, Silicon Valley, USA |
Towards detailed Organ-Scale Simulations in Cardiac Electrophysiology
GPU Technology Conference (GTC), Silicon Valley, San Jose, USA, 2020.Status: Published
Towards detailed Organ-Scale Simulations in Cardiac Electrophysiology
Afilliation | Scientific Computing |
Project(s) | Meeting Exascale Computing with Source-to-Source Compilers, Department of High Performance Computing |
Publication Type | Poster |
Year of Publication | 2020 |
Place Published | GPU Technology Conference (GTC), Silicon Valley, San Jose, USA |
Type of Work | Poster |
Talks, contributed
Balancing the numerical and parallel performance for reservoir simulations
In SIAM Conference on Computational Science and Engineering (CSE19), Spokane, Washington, USA, 2019.Status: Published
Balancing the numerical and parallel performance for reservoir simulations
The overall performance of a PDE-based simulator depends on two factors: the algorithmic efficiency of the numerical scheme chosen and the parallel efficiency of the software implementation. Since aspects from the two factors may influence each other's performance, a suitable balance between the two is important. The focus of this talk is on the OPM framework of oil reservoir simulation, for which the computational core is to solve the black-oil model: a coupled system of nonlinear PDEs. Due to large variations in the geological properties of a reservoir, the sparse matrix that arises from discretizing the coupled PDEs exhibits a strong heterogeneity in its nonzero values. These reflect the strength of coupling between the degrees of freedom. It is thus necessary to consider this heterogeneity in the unstructured mesh partitioning process, typically translated to partitioning a graph with weighted edges. Particularly, we study the impact of different strategies of edge weighting on both the numerical and parallel performance. The ordering of the degrees of freedom, which also affects both sides, is studied in addition. Our purpose is to shed some light on a suitable mesh partitioning and ordering methodolgy, which is also relevant beyond the context of reservoir simulation. The issue of how to allow users of OPM to inject such flexibility into the existing software framework is also discussed.
Afilliation | Scientific Computing |
Project(s) | Department of High Performance Computing |
Publication Type | Talks, contributed |
Year of Publication | 2019 |
Location of Talk | SIAM Conference on Computational Science and Engineering (CSE19), Spokane, Washington, USA |
Keywords | HPC |
Compiling finite element variational forms for GPU-based assembly
In FEniCS‘19, Washington DC, USA, 2019.Status: Published
Compiling finite element variational forms for GPU-based assembly
We present an experimental form compiler for exploring GPU-based algorithms for assembling vectors, matrices, and higher-order tensors from finite element variational forms.
Previous studies by Cecka et al. (2010), Markall et al. (2013), and Reguly and Giles (2015) have explored different strategies for using GPUs for finite element assembly, demonstrating the potential rewards and highlighting some of the difficulties in offloading assembly to a GPU. Even though these studies are limited to a few specific cases, mostly related to the Poisson problem, they already indicate that to achieve high performance, the appropriate assembly strategy depends on the problem at hand and the chosen discretisation.
By using a form compiler to automatically generate code for GPU-based assembly, we can explore a range of problems based on different variational forms and finite element discretisations. In this way, we aim to get a better picture of the potential benefits and challenges of assembling finite element variational forms on a GPU. Ultimately, the goal is to explore algorithms based on automated code generation that offload entire finite element methods to a GPU, including assembly of vectors and matrices and solution of linear systems.
In this talk, we give an exact characterisation of the class of finite element variational forms supported by our compiler, comprising a small subset of the Unified Form Language that is used by FEniCS and Firedrake. Furthermore, we describe a denotational semantics that explains how expressions in the form language are translated to low-level C or CUDA code for performing assembly over a computational mesh. We also present some initial results and discuss the performance of the generated code.
Afilliation | Scientific Computing |
Project(s) | Meeting Exascale Computing with Source-to-Source Compilers, Department of High Performance Computing , Department of Numerical Analysis and Scientific Computing |
Publication Type | Talks, contributed |
Year of Publication | 2019 |
Location of Talk | FEniCS‘19, Washington DC, USA |
Keywords | Code translation, GPU, HPC |
Proceedings, refereed
Combining algorithmic rethinking and AVX-512 intrinsics for efficient simulation of subcellular calcium signaling
In International Conference on Computational Science (ICCS 2019). Springer, 2019.Status: Published
Combining algorithmic rethinking and AVX-512 intrinsics for efficient simulation of subcellular calcium signaling
Calcium signaling is vital for the contraction of the heart. Physiologically realistic simulation of this subcellular process requires nanometer resolutions and a complicated mathematical model of differential equations. Since the subcellular space is composed of several irregularly-shaped and intricately-connected physiological domains with distinct properties, one particular challenge is to correctly compute the diffusion-induced calcium fluxes between the physiological domains. The common approach is to pre-calculate the effective diffusion coefficients between all pairs of neighboring computational voxels, and store them in large arrays. Such a strategy avoids complicated if-tests when looping through the computational mesh, but suffers from substantial memory overhead. In this paper, we adopt a memory-efficient strategy that uses a small lookup table of diffusion coefficients. The memory footprint and traffic are both drastically reduced, while also avoiding the if-tests. However, the new strategy induces more instructions on the processor level. To offset this potential performance pitfall, we use AVX-512 intrinsics to effectively vectorize the code. Performance measurements on a Knights Landing processor and a quad-socket Skylake server show a clear performance advantage of the manually vectorized implementation that uses lookup tables, over the counterpart using coefficient arrays.
Afilliation | Scientific Computing |
Project(s) | Meeting Exascale Computing with Source-to-Source Compilers, Department of High Performance Computing |
Publication Type | Proceedings, refereed |
Year of Publication | 2019 |
Conference Name | International Conference on Computational Science (ICCS 2019) |
Pagination | 681-687 |
Publisher | Springer |
DOI | 10.1007/978-3-030-22750-0_66 |
Towards Detailed Real-Time Simulations of Cardiac Arrhythmia
In Computing in Cardiology. Vol. 46. IEEE, 2019.Status: Published
Towards Detailed Real-Time Simulations of Cardiac Arrhythmia
Recent advances in personalized arrhythmia risk pre- diction show that computational models can provide not only safer but also more accurate results than invasive pro- cedures. However, biophysically accurate simulations re- quire solving linear systems over fine meshes and time res- olutions, which can take hours or even days. This limits the use of such simulations in the clinic where diagnosis and treatment planning can be time sensitive, even if it is just for the reason of operation schedules. Furthermore, the non-interactive, non-intuitive way of accessing simula- tions and their results makes it hard to study these collab- oratively. Overcoming these limitations requires speeding up computations from hours to seconds, which requires a massive increase in computational capabilities.
Fortunately, the cost of computing has fallen dramati- cally in the past decade. A prominent reason for this is the recent introduction of manycore processors such as GPUs, which by now power the majority of the world’s leading supercomputers. These devices owe their success to the fact that they are optimized for massively parallel work- loads, such as applying similar ODE kernel computations to millions of mesh elements in scientific computing ap- plications. Unlike CPUs, which are typically optimized for sequential performance, this allows GPU architectures to dedicate more transistors to performing computations, thereby increasing parallel speed and energy efficiency.
Afilliation | Scientific Computing |
Project(s) | Meeting Exascale Computing with Source-to-Source Compilers |
Publication Type | Proceedings, refereed |
Year of Publication | 2019 |
Conference Name | Computing in Cardiology |
Volume | 46 |
Date Published | 12/2019 |
Publisher | IEEE |
Talks, invited
Heterogeneous computing for cardiac electrophysiology
In PREAPP workshop on Efficient Frameworks for Compute- and Data-intensive Computing (EFFECT), University of Tromsø, Norway, 2019.Status: Published
Heterogeneous computing for cardiac electrophysiology
Electrical activities inside the heart are immensely important for the functioning of this vital organ. In the pursuit of a scientific understanding of the processes and mechanisms in electro-physiology, computer simulations have become an established paradigm of research. Both the complex mathematical models and the extreme physiological details require huge-scale simulations, which nowadays see an increasing use of heterogeneous computing. That is, the computational power is delivered by more than one processor type. We will discuss some of the resulting challenges in programming and performance optimization. Successful applications from the domain of cardiac electro-physiology will be used to demonstrate the usefulness of heterogeneous computing. We will also take a peek into the future of heterogeneous computing through eX3: the brand-new national infrastructure for experimental exploration of exascale computing.
Afilliation | Scientific Computing |
Project(s) | PREAPP: PRoductivity and Energy-efficiency through Abstraction-based Parallel Programming , Meeting Exascale Computing with Source-to-Source Compilers, Department of High Performance Computing |
Publication Type | Talks, invited |
Year of Publication | 2019 |
Location of Talk | PREAPP workshop on Efficient Frameworks for Compute- and Data-intensive Computing (EFFECT), University of Tromsø, Norway |
Unstructured computational meshes and data locality
In Fifth Workshop on Programming Abstractions for Data Locality (PADAL'19), Inria Bordeaux, France, 2019.Status: Published
Unstructured computational meshes and data locality
Many scientific and engineering applications rely on unstructured computational meshes to capture the irregular shapes and intricate details involved. With respect to software implementation, unstructured meshes require indirectly-indexed, irregular accesses to data arrays. Attaining data locality in the memory hierarchy is thus challenging. This talk touches two related topics. First, we look at the ordering/clustering of entities in an unstructured mesh with respect to cache efficiency. Second, we re-examine the currently widely-used strategy of mesh partitioning, which is based on partitioning a corresponding graph with edge-cut as the optimisation objective. Mismatches between this mainstream methodology of data decomposition and the increasingly heterogeneous computing platforms will be discussed.
Afilliation | Scientific Computing |
Project(s) | Meeting Exascale Computing with Source-to-Source Compilers, Department of High Performance Computing |
Publication Type | Talks, invited |
Year of Publication | 2019 |
Location of Talk | Fifth Workshop on Programming Abstractions for Data Locality (PADAL'19), Inria Bordeaux, France |
Keywords | data locality, HPC |
Journal Article
Performance optimization and modeling of fine-grained irregular communication in UPC
Scientific Programming 2019 (2019): Article ID 6825728.Status: Published
Performance optimization and modeling of fine-grained irregular communication in UPC
The UPC programming language offers parallelism via logically partitioned shared memory, which typically spans physically disjoint memory sub-systems. One convenient feature of UPC is its ability to automatically execute between-thread data movement, such that the entire content of a shared data array appears to be freely accessible by all the threads. The programmer friendliness, however, can come at the cost of substantial performance penalties. This is especially true when indirectly indexing the elements of a shared array, for which the induced between-thread data communication can be irregular and have a fine-grained pattern. In this paper we study performance enhancement strategies specifically targeting such fine-grained irregular communication in UPC. Starting from explicit thread privatization, continuing with block-wise communication, and arriving at message condensing and consolidation, we obtained considerable performance improvement of UPC programs that originally require fine-grained irregular communication. Besides the performance enhancement strategies, the main contribution of the present paper is to propose performance models for the different scenarios, in form of quantifiable formulas that hinge on the actual volumes of various data movements plus a small number of easily obtainable hardware characteristic parameters. These performance models help to verify the enhancements obtained, while also providing insightful predictions of similar parallel implementations, not limited to UPC, that also involve between-thread or between-process irregular communication. As a further validation, we also apply our performance modeling methodology and hardware characteristic parameters to an existing UPC code for solving a 2D heat equation on a uniform mesh.
Afilliation | Scientific Computing |
Project(s) | PREAPP: PRoductivity and Energy-efficiency through Abstraction-based Parallel Programming , Meeting Exascale Computing with Source-to-Source Compilers |
Publication Type | Journal Article |
Year of Publication | 2019 |
Journal | Scientific Programming |
Volume | 2019 |
Pagination | Article ID 6825728 |
Date Published | 03/2019 |
Publisher | Hindawi |
Keywords | Fine-grained irregular communication, performance modeling, Performance optimization, Sparse matrix-vector multiplication, UPC programming language |
URL | https://www.hindawi.com/journals/sp/2019/6825728/ |
DOI | 10.1155/2019/6825728 |
Poster
Towards Detailed Real-Time Simulations of Cardiac Arrhythmia
International Conference in Computing in Cardiology, Singapore, 2019.Status: Published
Towards Detailed Real-Time Simulations of Cardiac Arrhythmia
Recent advances in personalized arrhythmia risk prediction show that computational models can provide not only safer but also more accurate results than invasive procedures. However, biophysically accurate simulations require solving linear systems over fine meshes and time resolutions, which can take hours or even days. This limits the use of such simulations in the clinic where diagnosis and treatment planning can be time sensitive, even if it is just for the reason of operation schedules. Furthermore, the non-interactive, non-intuitive way of accessing simulations and their results makes it hard to study these collaboratively.
Overcoming these limitations requires speeding up computations from hours to seconds, which requires a massive increase in computational capabilities.
Fortunately, the cost of computing has fallen dramatically in the past decade. A prominent reason for this is the recent introduction of manycore processors such as GPUs, which by now power the majority of the world’s leading supercomputers. These devices owe their success to the fact that they are optimized for massively parallel workloads, such as applying similar ODE kernel computations to millions of mesh elements in scientific computing applications. Unlike CPUs, which are typically optimized for sequential performance, this allows GPU architectures to dedicate more transistors to performing computations, thereby increasing parallel speed and energy efficiency.
In this poster, we present ongoing work on the parallelization of finite volume computations over an unstructured mesh as well as the challenges involved in building scalable simulation codes and discuss the steps needed to close the gap to accurate real-time computations.
Afilliation | Scientific Computing |
Project(s) | Meeting Exascale Computing with Source-to-Source Compilers, Department of High Performance Computing |
Publication Type | Poster |
Year of Publication | 2019 |
Date Published | 09/2019 |
Place Published | International Conference in Computing in Cardiology, Singapore |
Talks, contributed
Education in HPC and Data Science at Simula Research Lab and UiO
In SUPERDATA Workshop on curriculum development, Yunan, China, 2018.Status: Published
Education in HPC and Data Science at Simula Research Lab and UiO
Afilliation | Scientific Computing |
Publication Type | Talks, contributed |
Year of Publication | 2018 |
Location of Talk | SUPERDATA Workshop on curriculum development, Yunan, China |
Unstructured mesh partitioning in the presence of strong coefficient heterogeneity
In PDESoft 2018 Conference, Bergen, Norway, 2018.Status: Published
Unstructured mesh partitioning in the presence of strong coefficient heterogeneity
Mesh partitioning is the first step in enabling parallel computation for solving PDEs. For an unstructured computational mesh, the task of partitioning is nontrivial, which can be formulated as an optimization problem with two goals. The first goal is load balancing, i.e., dividing the computational work evenly among the subdomains. The second goal is communication overhead minimization, i.e., limiting the subsequent inter-subdomain communication. Traditionally, an unstructured mesh is translated to a graph before partitioning, where the graph's nodes correspond to the mesh entities and the edges reflect the direct couplings between mesh entities.
In the presence of strong coefficient heterogeneity in a PDE, the coupling between the degrees of freedom (e.g., the nonzero values in a linear system arisen from the discretization) will also exhibit strong heterogeneity. It is numerically beneficial to group the degrees of freedom that have strong in-between couplings in the same subdomain, whereas the weaker couplings are prioritized as places for the separation cuts between subdomains. This is achieved by heterogeneously weighting the edges of the graph, which is then partitioned by a graph partitioning algorithm that almost exclusively aims to minimize the so-called edge cut (sum of weights of all the cut-through edges).
Such an approach requires care, because an over-weighting of edges that represent strong couplings may result in too few ``light-weight'' edges that can be candidates for cut between subdomains. This may again lead to bad partitioning results where e.g., a subdomain consists of several disjoint patches. An accompanying weakness is that the associated edge-cut value no longer bears resemblance to the true communication overhead that will arise in the parallel solution process of the PDE.
This talk will report the results of ongoing work on investigating the weaknesses of the edge-cut-oriented graph partitoning strategy in the presence of strong coefficient heterogeneity. We will use concrete examples from reservoir simulations that involve strong disparity in key geological coefficients. Moreover, we aim to experiment with possible improvements to the current graph partitioning paradigm by giving less emphasis to edge cut while incorporating a new metric that better resembles the overhead of inter-subdomain communication.
Afilliation | Scientific Computing |
Publication Type | Talks, contributed |
Year of Publication | 2018 |
Location of Talk | PDESoft 2018 Conference, Bergen, Norway |
Talks, invited
Heterogeneous Computing: Programming, Performance and Applications
In CoSaS 2018 Symposium, Erlangen, Germany, 2018.Status: Published
Heterogeneous Computing: Programming, Performance and Applications
Afilliation | Scientific Computing |
Project(s) | Meeting Exascale Computing with Source-to-Source Compilers |
Publication Type | Talks, invited |
Year of Publication | 2018 |
Location of Talk | CoSaS 2018 Symposium, Erlangen, Germany |
Type of Talk | Invited keynote talk |
Proceedings, refereed
Memory Bandwidth Contention: Communication vs Computation Tradeoffs in Supercomputers with Multicore Architectures
In International Conference on Parallel and Distributed Systems (ICPADS). Singapore: ACM/IEEE, 2018.Status: Published
Memory Bandwidth Contention: Communication vs Computation Tradeoffs in Supercomputers with Multicore Architectures
We study the problem of contention for memory bandwidth between computation and communication in supercomputers that feature multicore CPUs. The problem arises when communication and computation are overlapped, and both operations compete for the same memory bandwidth. This contention is most visible at the limits of scalability, when communication and computation take similar amounts of time, and thus must be taken into account in order to reach maximum scalability in memory bandwidth bound applications. Typical examples of codes affected by the memory bandwidth contention problem are sparse matrix-vector computations, graph algorithms, and many machine learning problems, as they typically exhibit a high demand for both memory bandwidth and inter-node communication, while performing a relatively low number of arithmetic operations.
The problem is even more relevant in truly heterogeneous computations where CPUs and accelerators are used in concert. In that case it can lead to mispredictions of expected performance and consequently to suboptimal load balancing between CPU and accelerator, which in turn can lead to idling of powerful accelerators and thus to a large decrease in performance.
We propose a simple benchmark in order to quantify the loss of performance due to memory bandwidth contention. Based on that, we derive a theoretical model to determine the impact of the phenomenon on parallel memory-bound applications. We test the model on scientific computations, discuss the practical relevance of the problem and suggest possible techniques to remedy it.
Afilliation | Scientific Computing |
Project(s) | Meeting Exascale Computing with Source-to-Source Compilers |
Publication Type | Proceedings, refereed |
Year of Publication | 2018 |
Conference Name | International Conference on Parallel and Distributed Systems (ICPADS) |
Publisher | ACM/IEEE |
Place Published | Singapore |
Keywords | Hybrid MPI/OpenMP, Memory bandwidth contention, Multicore supercomputers, performance modeling, Scientific Computing |
Poster
Quantifying data traffic of sparse matrix-vector multiplication in a multi-level memory hierarchy
London, UK, 2018.Status: Published
Quantifying data traffic of sparse matrix-vector multiplication in a multi-level memory hierarchy
Sparse matrix-vector multiplication (SpMV) is the central operation in an iterative linear solver. On a computer with a multi-level memory hierarchy, SpMV performance is limited by memory or cache bandwidth. Furthermore, for a given sparse matrix, the volume of data traffic depends on the location of the matrix non-zeros. By estimating the volume of data traffic with Aho, Denning and Ullman’s page replacement model [1], we can locate bottlenecks in the memory hierarchy and evaluate optimizations such as matrix reordering. The model is evaluated by comparing with measurements from hardware performance counters on Intel Sandy Bridge.
[1]: Alfred V. Aho, Peter J. Denning, and Jeffrey D. Ullman. 1971. Principles of Optimal Page Replacement. J. ACM 18, 1 (January 1971), pp. 80-93.
Afilliation | Scientific Computing |
Project(s) | Meeting Exascale Computing with Source-to-Source Compilers |
Publication Type | Poster |
Year of Publication | 2018 |
Date Published | 06/2018 |
Place Published | London, UK |
Towards Detailed Organ-Scale Simulations in Cardiac Electrophysiology
International Symposium on Computational Science at Scale (CoSaS), Erlangen, Germany, 2018.Status: Published
Towards Detailed Organ-Scale Simulations in Cardiac Electrophysiology
We present implementations of tissue-scale 3D simulations of the human cardiac ventricle using a physiologically realistic cell model. Computational challenges in such simulations arise from two factors, the first of which is the sheer amount of computation when simulating a large number of cardiac cells in a detailed model containing 10^4 calcium release units, 10^6 stochastically changing ryanodine receptors and 1.5 × 10^5 L-type calcium channels per cell.
Additional challenges arise from the fact that the computational tasks have various levels of arithmetic intensity and control complexity, which require careful adaptation of the simulation code to the target device. By exploiting the strengths of GPUs and manycore accelerators, we obtain a performance that is far superior to that of the basic CPU implementation, thus paving the way for detailed whole-heart simulations in future generations of leadership class supercomputers.
Afilliation | Scientific Computing |
Project(s) | Meeting Exascale Computing with Source-to-Source Compilers |
Publication Type | Poster |
Year of Publication | 2018 |
Date Published | 09/2018 |
Place Published | International Symposium on Computational Science at Scale (CoSaS), Erlangen, Germany |
Type of Work | Poster |
Keywords | Cardiac electrophysiology, GPU, Scientific Computing, Xeon Phi |
Talk, keynote
Accelerated high-performance computing for computational cardiac electrophysiology
In The University of Tokyo, Tokyo, Japan, 2017.Status: Published
Accelerated high-performance computing for computational cardiac electrophysiology
Massively parallel hardware accelerators, such as GPUs, are nowadays prevalent in the HPC hardware landscape. While having tremendous computing power, these accelerators also bring programming challenges. Often, a different programming standard applies for the accelerators than that for the conventional CPUs. For computing clusters that consist of both accelerators and CPUs, where the latter are hosts of the accelerators, elaborate hybrid parallel programming is needed to ensure an efficient use of the heterogeneous hardware.
This talk aims to share some experiences of implementing computational science software for heterogeneous computing platforms. We look at two scenarios: CPU+GPU [1] and CPU+Xeon Phi [2][3] heterogeneous computing. Common for both scenarios is the necessity of a proper pipelining of the involved computational and communication tasks, such that the overhead of various data movements can be reduced or completely masked. Moreover, suitable multi-threading with thread divergence is needed on the CPU host side. This is for enforcing computation-communication overlap, coordinating the accelerators, and allowing the CPU hosts to also contribute with their computing power. We have successfully applied hybrid CPU+Knights Corner co-processor computing [2][3] to two topics of computational cardiac electrophysiology, making use of the Tianhe-2 supercomputer. Results [4] about using the new Xeon Phi Knights Landing processor will also be presented.
[1]. J. Langguth, M. Sourouri, G. T. Lines, S. B. Baden, and X. Cai. Scalable heterogeneous CPU-GPU computations for unstructured tetrahedral meshes. IEEE Micro, 35(4):6–15, 2015.
[2]. J. Chai, J. Hake, N. Wu, M. Wen, X. Cai, G. T. Lines, J. Yang, H. Su, C. Zhang, and X. Liao. Towards simulation of subcellular calcium dynamics at nanometre resolution. International Journal of High Performance Computing Applications, 29(1):51–63, 2015.
[3]. J. Langguth, Q. Lan, N. Gaur, and X. Cai. Accelerating detailed tissue-scale 3D cardiac simulations using heterogeneous CPU-Xeon Phi computing. International Journal of Parallel Programming, 45(5):1236–1258, 2017.
[4]. J. Langguth, C. Jarvis, and X. Cai. Porting tissue-scale cardiac simulations to the Knights Landing platform. Proceedings of ISC High Performance 2017, 376–388, 2017.
Afilliation | Scientific Computing |
Project(s) | Meeting Exascale Computing with Source-to-Source Compilers |
Publication Type | Talk, keynote |
Year of Publication | 2017 |
Location of Talk | The University of Tokyo, Tokyo, Japan |
Notes | 2nd International Symposium on Research and Education of Computational Science |
Proceedings, refereed
Automated Translation of MATLAB Code to C++ with Performance and Traceability
In The Eleventh International Conference on Advanced Engineering Computing and Applications in Sciences (ADVCOMP 2017). International Academy, Research and Industry Association (IARIA), 2017.Status: Published
Automated Translation of MATLAB Code to C++ with Performance and Traceability
In this paper, we discuss the implementation and performance of m2cpp: an automated translator from MATLAB code to its matching Armadillo counterpart in the C++ language. A non-invasive strategy has been adopted, meaning that the user of m2cpp does not insert annotations or additional code lines into the input serial MATLAB code. Instead, a combination of code analysis, automated preprocessing and a user-editable metainfo file ensures that m2cpp overcomes some specialties of the MATLAB language, such as implicit typing of variables and multiple return values from functions. Thread-based parallelisation, using either OpenMP or Intel's Threading Building Blocks (TBB) library, can also be carried out by m2cpp for designated for-loops. Such an automated and non-invasive strategy allows maintaining an independent MATLAB code base that is favoured by algorithm developers, while an updated translation into the easily readable C++ counterpart can be obtained at any time. Illustrating examples from seismic data processing are provided in this paper, with performance results obtained on multicore Sandy Bridge CPUs and Intel's Knights-Landing Xeon Phi processor.
Afilliation | Scientific Computing |
Project(s) | Meeting Exascale Computing with Source-to-Source Compilers |
Publication Type | Proceedings, refereed |
Year of Publication | 2017 |
Conference Name | The Eleventh International Conference on Advanced Engineering Computing and Applications in Sciences (ADVCOMP 2017) |
Pagination | 50-55 |
Date Published | 11/2017 |
Publisher | International Academy, Research and Industry Association (IARIA) |
ISBN Number | 978-1-61208-599-9 |
ISSN Number | 2308-4499 |
Keywords | C++, Code translation, Image processing, Matlab, Seismology |
URL | http://www.thinkmind.org/index.php?view=article&articleid=advcomp_2017_4... |
Porting Tissue-Scale Cardiac Simulations to the Knights Landing Platform
In International Conference on High Performance Computing. Lecture Notes in Computer Science, Springer, 2017.Status: Published
Porting Tissue-Scale Cardiac Simulations to the Knights Landing Platform
Afilliation | Scientific Computing |
Project(s) | Meeting Exascale Computing with Source-to-Source Compilers |
Publication Type | Proceedings, refereed |
Year of Publication | 2017 |
Conference Name | International Conference on High Performance Computing |
Date Published | 10/2017 |
Publisher | Lecture Notes in Computer Science, Springer |
ISBN Number | 978-3-319-67629-6 |
DOI | 10.1007/978-3-319-67630-2_28 |
Journal Article
Accelerating Detailed Tissue-Scale 3D Cardiac Simulations Using Heterogeneous CPU-Xeon Phi Computing
International Journal of Parallel Programming (2016): 1-23.Status: Published
Accelerating Detailed Tissue-Scale 3D Cardiac Simulations Using Heterogeneous CPU-Xeon Phi Computing
We investigate heterogeneous computing, which involves both multicore CPUs and manycore Xeon Phi coprocessors, as a new strategy for computational cardiology. In particular, 3D tissues of the human cardiac ventricle are studied with a physiologically realistic model that has 10,000 calcium release units per cell and 100 ryanodine receptors per release unit, together with tissue-scale simulations of the electrical activity and calcium handling. In order to attain resource-efficient use of heterogeneous computing systems that consist of both CPUs and Xeon Phis, we first direct the coding effort at ensuring good performance on the two types of compute devices individually. Although SIMD code vectorization is the main theme of performance programming, the actual implementation details differ considerably between CPU and Xeon Phi. Moreover, in addition to combined OpenMP+MPI programming, a suitable division of the cells between the CPUs and Xeon Phis is important for resource-efficient usage of an entire heterogeneous system. Numerical experiments show that good resource utilization is indeed achieved and that such a heterogeneous simulator paves the way for ultimately understanding the mechanisms of arrhythmia. The uncovered good programming practices can be used by computational scientists who want to adopt similar heterogeneous hardware platforms for a wide variety of applications.
Afilliation | Scientific Computing |
Project(s) | User-friendly programming of GPU-enhanced clusters, Center for Biomedical Computing (SFF) |
Publication Type | Journal Article |
Year of Publication | 2016 |
Journal | International Journal of Parallel Programming |
Pagination | 1-23 |
Date Published | 10/2016 |
Publisher | ACM/Springer |
Keywords | Calcium handling, multiscale cardiac tissue simulation, supercomputing, Xeon Phi |
DOI | 10.1007/s10766-016-0461-2 |
Panda: A Compiler Framework for Concurrent CPU+GPU Execution of 3D Stencil Computations on GPU-accelerated Supercomputers
International Journal of Parallel Programming (2016).Status: Published
Panda: A Compiler Framework for Concurrent CPU+GPU Execution of 3D Stencil Computations on GPU-accelerated Supercomputers
We present a new compiler framework for truly heterogeneous 3D stencil computation on GPU clusters. Our framework consists of a simple directive-based programming model and a tightly integrated source-to-source compiler. Annotated with a small number of directives, sequential stencil C codes can be automatically parallelized for large-scale GPU clusters. The most distinctive feature of the compiler is its capability to generate hybrid MPI+CUDA+OpenMP code that uses concurrent CPU+GPU computing to unleash the full potential of powerful GPU clusters. The auto-generated hybrid codes hide the overhead of various data motion by overlapping them with computation. Test results on the Titan supercomputer and the Wilkes cluster show that auto-translated codes can achieve about 90% of the performance of highly optimized handwritten codes, for both a simple stencil benchmark and a real-world application in cardiac modeling. The user-friendliness and performance of our domain-specific compiler framework allow harnessing the full power of GPU-accelerated supercomputing without painstaking coding effort.
Afilliation | Scientific Computing |
Project(s) | User-friendly programming of GPU-enhanced clusters, Center for Biomedical Computing (SFF) |
Publication Type | Journal Article |
Year of Publication | 2016 |
Journal | International Journal of Parallel Programming |
Date Published | 10/2016 |
Publisher | ACM/Springer |
Keywords | code generation, code optimisation, CPU+GPU computing, CUDA, heterogeneous computing, MPI, OpenMP, source-to-source translation, stencil computation |
DOI | 10.1007/s10766-016-0454-1 |
Solving 3D Time-Fractional Diffusion Equations by High-Performance Parallel Computing
Fractional Calculus and Applied Analysis 19, no. 1 (2016): 140-160.Status: Published
Solving 3D Time-Fractional Diffusion Equations by High-Performance Parallel Computing
Numerically solving time-fractional diffusion equations, especially in three space dimensions, is a daunting computational task. This is due to the huge requirements of both computation time and memory storage. Compared with solving integer-ordered diffusion equations, the costs for time and storage both increase by a factor that equals the number of time steps involved. Aiming to overcome these two obstacles, we study in this paper three programming techniques: loop unrolling, vectorization and parallelization. For a representative numerical scheme that adopts finite differencing and explicit time integration, the performance-enhancing techniques are indeed shown to dramatically reduce the computation time, while allowing the use of many CPU cores and thereby a large amount of memory storage. Moreover, we have developed simple-to-use performance models that support our empirical findings, which are based on using up to 8192 CPU cores and 12.2 terabytes.
Afilliation | Scientific Computing |
Project(s) | Center for Biomedical Computing (SFF) |
Publication Type | Journal Article |
Year of Publication | 2016 |
Journal | Fractional Calculus and Applied Analysis |
Volume | 19 |
Issue | 1 |
Pagination | 140-160 |
Publisher | DE GRUYTER |
Keywords | fractional differential equations, loop unrolling, parallel computing, vectorization |
URL | http://www.degruyter.com/view/j/fca.2016.19.issue-1/fca-2016-0008/fca-20... |
DOI | 10.1515/fca-2016-0008 |
Proceedings, refereed
Enabling Tissue-Scale Cardiac Simulations Using Heterogeneous Computing on Tianhe-2
In IEEE 22nd International Conference on Parallel and Distributed Systems (ICPADS). ACM/IEEE, 2016.Status: Published
Enabling Tissue-Scale Cardiac Simulations Using Heterogeneous Computing on Tianhe-2
We develop a simulator for 3D tissue of the human cardiac ventricle with a physiologically realistic cell model and deploy it on the supercomputer Tianhe-2. In order to attain the full performance of the heterogeneous CPU-Xeon Phi design, we use carefully optimized codes for both devices and combine them to obtain suitable load balancing. Using a large number of nodes, we are able to perform tissue-scale simulations of the electrical activity and calcium handling in millions of cells, at a level of detail that tracks the states of trillions of ryanodine receptors. We can thus simulate arrythmogenic spiral waves and other complex arrhythmogenic patterns which arise from calcium handling deficiencies in human cardiac ventricle tissue. Due to extensive code tuning and parallelization via OpenMP, MPI, and SCIF/COI, large scale simulations of 10 heartbeats can be performed in a matter of hours. Test results indicate excellent scalability, thus paving the way for detailed whole-heart simulations in future generations of leadership class supercomputers.
Afilliation | Scientific Computing |
Project(s) | User-friendly programming of GPU-enhanced clusters, Center for Biomedical Computing (SFF) |
Publication Type | Proceedings, refereed |
Year of Publication | 2016 |
Conference Name | IEEE 22nd International Conference on Parallel and Distributed Systems (ICPADS) |
Pagination | 843-852 |
Date Published | 12/2016 |
Publisher | ACM/IEEE |
ISSN Number | 1521-9097 |
Keywords | Calcium handling, multiscale cardiac tissue simulation, supercomputing, Xeon Phi |
DOI | 10.1109/ICPADS.2016.0114 |
Matlab2cpp: A Matlab-to-C++ code translator
In IEEE 2016 11th System of Systems Engineering Conference (SoSE). IEEE, 2016.Status: Published
Matlab2cpp: A Matlab-to-C++ code translator
This paper discusses the source-to-source Matlab2cpp translator, which is currently being developed in the EMC2 project. With help of user-supplied information about variable data types and a few special translation rules, Matlab code can be automatically translated into C++ code that makes use of the Armadillo C++ library. Preliminary tests with examples from the SeismicLab package have confirmed that this Matlab-to-C++ translator is indeed capable of handling realistic Matlab code. This tool thus has the potential of closing the gap between human-friendly experimentation offered by interactive Matlab scripting and performance-critical production runs that rely on C++ programming.
Afilliation | Scientific Computing |
Project(s) | Center for Biomedical Computing (SFF) |
Publication Type | Proceedings, refereed |
Year of Publication | 2016 |
Conference Name | IEEE 2016 11th System of Systems Engineering Conference (SoSE) |
Date Published | 06/2016 |
Publisher | IEEE |
Keywords | Armadillo, C++, Code translation, Matlab |
DOI | 10.1109/SYSOSE.2016.7542966 |
On the Performance and Energy Efficiency of the PGAS Programming Model on Multicore Architectures
In High Performance Computing & Simulation (2016) - International Workshop on Optimization of Energy Efficient HPC & Distributed Systems. ACM IEEE, 2016.Status: Published
On the Performance and Energy Efficiency of the PGAS Programming Model on Multicore Architectures
Afilliation | Scientific Computing |
Project(s) | PREAPP: PRoductivity and Energy-efficiency through Abstraction-based Parallel Programming , Center for Biomedical Computing (SFF) |
Publication Type | Proceedings, refereed |
Year of Publication | 2016 |
Conference Name | High Performance Computing & Simulation (2016) - International Workshop on Optimization of Energy Efficient HPC & Distributed Systems |
Date Published | 08/2016 |
Publisher | ACM IEEE |
URL | http://dx.doi.org/10.1109/HPCSim.2016.7568416 |
DOI | 10.1109/HPCSim.2016.7568416 |
Journal Article
An Analytical GPU Performance Model for 3D Stencil Computations from the Angle of Data Traffic
The Journal of Supercomputing 71, no. 7 (2015): 2433-2453.Status: Published
An Analytical GPU Performance Model for 3D Stencil Computations from the Angle of Data Traffic
The achievable GPU performance of many scientific computations is not determined by a GPU's peak floating-point rate, but rather how fast data are moved through different stages of the entire memory hierarchy. We take low-order 3D stencil computations as a representative class to study the reachable GPU performance from the angle of data traffic. Specifically, we propose a simple analytical model to estimate the execution time based on quantifying the data traffic volume at three stages: (1) between registers and on-SMX storage, (2) between on-SMX storage and L2 cache, (3) between L2 cache and GPU's device memory. Three associated granularities are used: a CUDA thread, a thread block, and a set of simultaneously active thread blocks. For four 3D stencil computations, NVIDIA's profiling tools have been used to verify the accuracy of the quantified data traffic volumes, by trying a large number of executions with different problem sizes and thread block configurations. Moreover, by introducing an imbalance coefficient, together with the known realistic memory bandwidths, we can predict the execution time usage based on the quantified data traffic volumes. For the four 3D stencils, the average error of the time prediction is 6.9% for a baseline implementation approach, whereas for a blocking implementation approach the average prediction error is 9.5%.
Afilliation | Scientific Computing, , |
Project(s) | Center for Biomedical Computing (SFF) |
Publication Type | Journal Article |
Year of Publication | 2015 |
Journal | The Journal of Supercomputing |
Volume | 71 |
Issue | 7 |
Pagination | 2433-2453 |
Date Published | 02/2015 |
Publisher | Springer |
ISSN | 0920-8542 |
Keywords | 3D stencil methods, GPU, performance modeling |
URL | http://link.springer.com/article/10.1007/s11227-015-1392-1 |
DOI | 10.1007/s11227-015-1392-1 |
Communication-Hiding Programming for Clusters with Multi-Coprocessor Nodes
Concurrency and Computation: Practice and Experience 27, no. 16 (2015): 4172-4185.Status: Published
Communication-Hiding Programming for Clusters with Multi-Coprocessor Nodes
Future exascale systems are expected to adopt compute nodes that incorporate many accelerators. To shed some light on the upcoming software challenge, this paper investigates the particular topic of programming clusters that have multiple Xeon Phi coprocessors in each compute node. A new offload approach is considered for intra-node communication, which combines Intel’s APIs of coprocessor offload infrastructure (COI) and symmetric communication interface (SCIF) for achieving low latency. While the conventional pragma-based offload approach allows simpler programming, the COI-SCIF approach has three advantages in (1) lower overhead associated with launching offloaded code, (2) higher data transfer bandwidths, and (3) more advanced asynchrony between computation and data movement. The low-level COI-SCIF approach is also shown to have benefits over the MPI-OpenMP counterpart, which belongs to the symmetric usage mode. Moreover, a hybird programming strategy based on COI-SCIF is presented for joining the computational force of all CPUs and coprocessors, while realizing communication hiding. All the programming approaches are tested by a real-world 3D application, for which the COI-SCIF-based approach shows a performance advantage on Tianhe-2.
Afilliation | Scientific Computing |
Project(s) | Center for Biomedical Computing (SFF) |
Publication Type | Journal Article |
Year of Publication | 2015 |
Journal | Concurrency and Computation: Practice and Experience |
Volume | 27 |
Issue | 16 |
Pagination | 4172–4185 |
Date Published | 05/2015 |
Publisher | John Wiley & Sons, Ltd |
Keywords | hybrid programming, Intel Xeon Phi coprocessor, offload model, SCIF, Tianhe-2 |
Notes | Published online before print. |
DOI | 10.1002/cpe.3507 |
Enabling a Uniform OpenCL Device View for Heterogeneous Platforms
IEICE Transactions on Information and Systems E98-D, no. 4 (2015): 812-823.Status: Published
Enabling a Uniform OpenCL Device View for Heterogeneous Platforms
Aiming to ease the parallel programming for heterogeneous architectures, we propose and implement a high-level OpenCL runtime that conceptually merges multiple heterogeneous hardware devices into one virtual heterogeneous compute device (VHCD). Moreover, automated workload distribution among the devices is based on offline profiling, together with new programming directives that define the device-independent data access range per work-group. Therefore, an OpenCL program originally written for a single compute device can, after inserting a small number of programming directives, run efficiently on a platform consisting of heterogeneous compute devices. Performance is ensured by introducing the technique of virtual cache management, which minimizes the amount of host-device data transfer. Our new OpenCL runtime is evaluated by a diverse set of OpenCL benchmarks, demonstrating good performance on various configurations of a heterogeneous system.
Afilliation | Scientific Computing, , |
Project(s) | Center for Biomedical Computing (SFF) |
Publication Type | Journal Article |
Year of Publication | 2015 |
Journal | IEICE Transactions on Information and Systems |
Volume | E98-D |
Issue | 4 |
Pagination | 812-823 |
Date Published | 04/2015 |
Publisher | IEICE |
ISSN | 1745-1361 |
Keywords | automated workload distribution, data transfer minimization, heterogeneous devices, OpenCL, virtualized single device |
DOI | 10.1587/transinf.2014EDP7244 |
Parallel performance modeling of irregular applications in cell-centered finite volume methods over unstructured tetrahedral meshes
Journal of Parallel and Distributed Computing 76 (2015): 120-131.Status: Published
Parallel performance modeling of irregular applications in cell-centered finite volume methods over unstructured tetrahedral meshes
Finite volume methods are widely used numerical strategies for solving partial differential equations. This paper aims at obtaining a quantitative understanding of the achievable performance of the cell-centered finite volume method on 3D unstructured tetrahedral meshes, using traditional multicore CPUs as well as modern GPUs. By using an optimized implementation and a synthetic connectivity matrix that exhibits a perfect structure of equal-sized blocks lying on the main diagonal, we can closely relate the achievable computing performance to the size of these diagonal blocks. Moreover, we have derived a theoretical model for identifying characteristic levels of the attainable performance as a function of hardware parameters, based on which a realistic upper limit of the performance can be predicted accurately. For real-world tetrahedral meshes, the key to high performance lies in a reordering of the tetrahedra, such that the resulting connectivity matrix resembles a block diagonal form where the optimal size of the blocks depends on the hardware. Numerical experiments confirm that the achieved performance is close to the practically attainable maximum and it reaches 75% of the theoretical upper limit, independent of the actual tetrahedral mesh considered. From this, we develop a general model capable of identifying bottleneck performance of a system’s memory hierarchy in irregular applications.
Afilliation | , , Scientific Computing |
Project(s) | Center for Biomedical Computing (SFF) |
Publication Type | Journal Article |
Year of Publication | 2015 |
Journal | Journal of Parallel and Distributed Computing |
Volume | 76 |
Pagination | 120-131 |
Date Published | 02/2015 |
Publisher | Elsevier |
DOI | 10.1016/j.jpdc.2014.10.005 |
Scalable heterogeneous CPU-GPU computations for unstructured tetrahedral meshes
IEEE Micro 35, no. 4 (2015): 6-15.Status: Published
Scalable heterogeneous CPU-GPU computations for unstructured tetrahedral meshes
A recent trend in modern high-performance computing environments is the introduction of powerful, energy-efficient hardware accelerators such as GPUs and Xeon Phi coprocessors. These specialized computing devices coexist with CPUs and are optimized for highly parallel applications. In regular computing-intensive applications with predictable data access patterns, these devices often far outperform CPUs and thus relegate the latter to pure control functions instead of computations. For irregular applications, however, the performance gap can be much smaller and is sometimes even reversed. Thus, maximizing the overall performance on heterogeneous systems requires making full use of all available computational resources, including both accelerators and CPUs.
Afilliation | , , Scientific Computing |
Project(s) | Center for Biomedical Computing (SFF) |
Publication Type | Journal Article |
Year of Publication | 2015 |
Journal | IEEE Micro |
Volume | 35 |
Issue | 4 |
Pagination | 6-15 |
Date Published | 07/2015 |
Publisher | ACM IEEE |
DOI | 10.1109/MM.2015.70 |
Towards Simulation of Subcellular Calcium Dynamics at Nanometre Resolution
International Journal of High Performance Computing Applications 29, no. 1 (2015): 51-63.Status: Published
Towards Simulation of Subcellular Calcium Dynamics at Nanometre Resolution
Numerical simulation of subcellular dynamics with a resolution down to one nanometre can be an important tool for discovering the physiological cause of many heart diseases. The requirement of enormous computational power, however, has made such simulations prohibitive so far. By using up to 12,288 Intel Xeon Phi 31S1P coprocessors on the new hybrid cluster Tianhe-2, which is the new number one supercomputer of the world, we have achieved 1.27 Pflop/s in double precision, which brings us much closer to the nanometre resolution. This is the result of efficiently using the hardware on different levels: (1) a single Xeon Phi (2) a single compute node that consists of a host and three coprocessors, and (3) a huge number of interconnected nodes. To overcome the challenge of programming Intel’s new many-integrated core (MIC) architecture, we have adopted techniques such as vectorization, hierarchical data blocking, register data reuse, offloading computations to the coprocessors, and pipelining computations with intra-/inter-node communications.
Afilliation | Scientific Computing, , , Scientific Computing |
Project(s) | Center for Biomedical Computing (SFF) |
Publication Type | Journal Article |
Year of Publication | 2015 |
Journal | International Journal of High Performance Computing Applications |
Volume | 29 |
Issue | 1 |
Pagination | 51-63 |
Publisher | SAGE |
DOI | 10.1177/1094342013514465 |
Proceedings, refereed
CPU+GPU Programming of Stencil Computations for Resource-Efficient Use of GPU Clusters
In IEEE 18th International Conference on Computational Science and Engineering. IEEE Computer Society, 2015.Status: Published
CPU+GPU Programming of Stencil Computations for Resource-Efficient Use of GPU Clusters
On modern GPU clusters, the role of the CPUs is often restricted to controlling the GPUs and handling MPI communication. The unused computing power of the CPUs, however, can be considerable for computations whose performance is bounded by memory traffic. This paper investigates the challenges of simultaneous usage of CPUs and GPUs for computation. Our emphasis is on deriving a heterogeneous CPU+GPU programming approach that combines MPI, OpenMP and CUDA. To effectively hide the overhead of various inter- and intra-node communications, a new level of task parallelism is introduced on top of the conventional data parallelism. Combined with a suitable workload division between the CPUs and GPUs, our CPU+GPU programming approach is able to fully utilize the different processing units. The programming details and achievable performance are exemplified by a widely used 3D 7-point stencil computation, which shows high performance and scaling in experiments using up to 64 CPU-GPU nodes.
Afilliation | Scientific Computing, , |
Project(s) | Center for Biomedical Computing (SFF) |
Publication Type | Proceedings, refereed |
Year of Publication | 2015 |
Conference Name | IEEE 18th International Conference on Computational Science and Engineering |
Pagination | 17-26 |
Date Published | 10/2015 |
Publisher | IEEE Computer Society |
Keywords | CPU+GPU computing, CUDA, GPU, MPI, stencil |
DOI | 10.1109/CSE.2015.33 |
Multi-GPU Implementations of Parallel 3D Sweeping Algorithms with Application to Geological Folding
In ICCS 2015. Elsevier, 2015.Status: Published
Multi-GPU Implementations of Parallel 3D Sweeping Algorithms with Application to Geological Folding
This paper studies the CUDA programming challenges with using multiple GPUs inside a single machine to carry out plane-by-plane updates in parallel 3D sweeping algorithms. In particular, care must be taken to mask the overhead of various data movements between the GPUs. Multiple OpenMP threads on the CPU side should be combined multiple CUDA streams per GPU to hide the data transfer cost related to the halo computation on each 2D plane. Moreover, the technique of peer-to-peer data motion can be used to reduce the impact of 3D volumetric data shuffles that have to be done between mandatory changes of the grid partitioning. We have investigated the performance improvement of 2- and 4-GPU implementations that are applicable to 3D anisotropic front propagation computations related to geological folding. In comparison with a straightforward multi-GPU implementation, the overall performance improvement due to masking of data movements on four GPUs of the Fermi architecture was 23%. The corresponding improvement obtained on four Kepler GPUs was 47%.
Afilliation | , Scientific Computing |
Publication Type | Proceedings, refereed |
Year of Publication | 2015 |
Conference Name | ICCS 2015 |
Pagination | 1494-1503 |
Date Published | 06/2015 |
Publisher | Elsevier |
Keywords | 3D sweeping, anisotropic front propagation, CUDA programming, NVIDIA GPU, OpenMP |
DOI | 10.1016/j.procs.2015.05.339 |
Towards Detailed Tissue-Scale 3D Simulations of Electrical Activity and Calcium Handling in the Human Cardiac Ventricle
In The 15th International Conference on Algorithms and Architectures for Parallel Processing (ICA3PP 2015). Lecture Notes in Computer Science, Springer Verlag, 2015.Status: Published
Towards Detailed Tissue-Scale 3D Simulations of Electrical Activity and Calcium Handling in the Human Cardiac Ventricle
We adopt a detailed human cardiac cell model, which has 10000 calcium release units, in connection with simulating the electrical activity and calcium handling at the tissue scale. This is a computationally intensive problem requiring a combination of efficient numerical algorithms and parallel programming. To this end, we use a method that is based on binomial distributions to collectively study the stochastic state transitions of the 100 ryanodine receptors inside every calcium release unit, instead of individually following each ryanodine receptor. Moreover, the implementation of the parallel simulator has incorporated optimizations in form of code vectorization and removing redundant calculations. Numerical experiments show very good parallel performance of the 3D simulator and demonstrate that various physiological behaviors are correctly reproduced. This work thus paves way for high-fidelity 3D simulations of human ventricular tissues, with the ultimate goal of understanding the mechanisms of arrhythmia.
Afilliation | Scientific Computing, Scientific Computing, , , Scientific Computing |
Project(s) | Center for Biomedical Computing (SFF) |
Publication Type | Proceedings, refereed |
Year of Publication | 2015 |
Conference Name | The 15th International Conference on Algorithms and Architectures for Parallel Processing (ICA3PP 2015) |
Pagination | 79-92 |
Date Published | 11/2015 |
Publisher | Lecture Notes in Computer Science, Springer Verlag |
ISBN Number | 978-3-319-27136-1 |
Keywords | Calcium handling, multiscale cardiac tissue simulation, supercomputing |
URL | http://link.springer.com/chapter/10.1007/978-3-319-27137-8_7 |
DOI | 10.1007/978-3-319-27137-8_7 |
Poster
Dysfunctional Sarcoplasmic Reticulum Ca2+ Release Underlies Arrhythmogenic Triggers in Catecholaminergic Polymorphic Ventricular Tachycardia: A Simulation Study in a Human Ventricular Myocyte Model
In Gordons Research Conference on Cardiac Arrhythmia. Lucca, Italy: Gordons Research Conference on Cardiac Arrhythmia, 2015.Status: Published
Dysfunctional Sarcoplasmic Reticulum Ca2+ Release Underlies Arrhythmogenic Triggers in Catecholaminergic Polymorphic Ventricular Tachycardia: A Simulation Study in a Human Ventricular Myocyte Model
Afilliation | Scientific Computing, Scientific Computing |
Publication Type | Poster |
Year of Publication | 2015 |
Secondary Title | Gordons Research Conference on Cardiac Arrhythmia |
Publisher | Gordons Research Conference on Cardiac Arrhythmia |
Place Published | Lucca, Italy |
Technical reports
Is PGAS ready for the challenge of energy efficiency? A study with the NAS benchmark.
Tromsø: UiT, 2015.Status: Published
Is PGAS ready for the challenge of energy efficiency? A study with the NAS benchmark.
In this study we compare the performance and power efficiency of Unified Parallel C (UPC), MPI and OpenMP by running a set of kernels from the NAS Benchmark.
One of the goals of this study is to focus on the Partitioned Global Address Space (PGAS) model, in order to describe it and compare it to MPI and OpenMP.
In particular we consider the power efficiency expressed in millions operations per second per watt as a criterion to evaluate the suitability of PGAS compared to MPI and OpenMP.
Based on these measurements, we provide an analysis to explain the difference of performance between UPC, MPI, and OpenMP.
Afilliation | Scientific Computing, , |
Project(s) | Center for Biomedical Computing (SFF) |
Publication Type | Technical reports |
Year of Publication | 2015 |
Publisher | UiT |
Place Published | Tromsø |
Keywords | MPI, NAS Benchmark, OpenMP, performance evaluation, PGAS, power efficiency, UPC |
URL | http://munin.uit.no/bitstream/handle/10037/8207/article.pdf?sequence=1&i... |
Book Chapter
Parallel Computing
In Encyclopedia of Applied and Computational Mathematics, 1129-1132. Springer Berlin Heidelberg, 2015.Status: Published
Parallel Computing
Parallel computing can be understood as solving a computational problem through collaborative use of multiple resources that belong to a parallel computer system. Here, a parallel system can be anything between a single multiprocessor machine and an Internet-connected cluster that is made up of hybrid compute nodes. There are two main motivations for adopting parallel computations. The first motivation is about reducing the computational time, because employing more computational units for solving a same problem usually results in lower wall-time usage. The second – and perhaps more important – motivation is the wish of obtaining more details, which can arise from higher temporal and spatial resolutions, more advanced mathematical and numerical models, and more realizations.
Afilliation | , , Scientific Computing |
Project(s) | Center for Biomedical Computing (SFF) |
Publication Type | Book Chapter |
Year of Publication | 2015 |
Book Title | Encyclopedia of Applied and Computational Mathematics |
Pagination | 1129-1132 |
Date Published | 11/2015 |
Publisher | Springer Berlin Heidelberg |
ISBN Number | 978-3-540-70528-4 |
DOI | 10.1007/978-3-540-70529-1_424 |
Talks, contributed
Arrhythmogenic Mechanisms and Therapeutic Targets for Catecholaminergic Polymorphic Ventricular Tachycardia: A Simulation Study in a Human Ventricular Myocyte
In Simula Research Laboratory, 2014.Status: Published
Arrhythmogenic Mechanisms and Therapeutic Targets for Catecholaminergic Polymorphic Ventricular Tachycardia: A Simulation Study in a Human Ventricular Myocyte
Afilliation | Scientific Computing, , Scientific Computing |
Project(s) | Center for Biomedical Computing (SFF) |
Publication Type | Talks, contributed |
Year of Publication | 2014 |
Location of Talk | Simula Research Laboratory |
Type of Talk | Cardiac Modeling Workshop |
Mathematical Modeling of Ca Handling and Computational Studies of Ca-related Arrhythmogenesis in Heart
In National University of Defense Technology, China. Changsha, China, 2014.Status: Published
Mathematical Modeling of Ca Handling and Computational Studies of Ca-related Arrhythmogenesis in Heart
Afilliation | Scientific Computing, , Scientific Computing |
Project(s) | Center for Biomedical Computing (SFF) |
Publication Type | Talks, contributed |
Year of Publication | 2014 |
Location of Talk | National University of Defense Technology, China |
Place Published | Changsha, China |
Type of Talk | Workshop |
Proceedings, refereed
Automated Transformation of GPU-Specific OpenCL Kernels Targeting Performance Portability on Multi-Core/Many-Core CPUs
In Proceedings of Euro-Par 2014. Vol. 8632. LNCS 8632. Berlin Heidelberg New York: Springer, 2014.Status: Published
Automated Transformation of GPU-Specific OpenCL Kernels Targeting Performance Portability on Multi-Core/Many-Core CPUs
Afilliation | Scientific Computing |
Project(s) | Center for Biomedical Computing (SFF) |
Publication Type | Proceedings, refereed |
Year of Publication | 2014 |
Conference Name | Proceedings of Euro-Par 2014 |
Volume | 8632 |
Pagination | 210-221 |
Publisher | Springer |
Place Published | Berlin Heidelberg New York |
Keywords | Conference |
DOI | 10.1007/978-3-319-09873-9_18 |
Effective Multi-GPU Communication Using Multiple CUDA Streams and Threads
In 20th International Conference on Parallel and Distributed Systems (ICPADS 2014). IEEE, 2014.Status: Published
Effective Multi-GPU Communication Using Multiple CUDA Streams and Threads
In the context of multiple GPUs that share the same PCIe bus, we propose a new communication scheme that leads to a more effective overlap of communication and computation. Multiple CUDA streams and OpenMP threads are adopted so that data can simultaneously be sent and received. A representative 3D stencil example is used to demonstrate the effectiveness of our scheme. We compare the performance of our new scheme with an MPI-based state-of-the-art scheme. Results show that our approach outperforms the state-of-the-art scheme, being up to 1.85× faster. However, our performance results also indicate that the current underlying PCIe bus architecture needs improvements to handle the future scenario of many GPUs per node.
Afilliation | Scientific Computing, Scientific Computing, , |
Project(s) | Center for Biomedical Computing (SFF) |
Publication Type | Proceedings, refereed |
Year of Publication | 2014 |
Conference Name | 20th International Conference on Parallel and Distributed Systems (ICPADS 2014) |
Pagination | 981-986 |
Publisher | IEEE |
DOI | 10.1109/PADSW.2014.7097919 |
Heterogeneous CPU-GPU Computing for the Finite Volume Method on 3D Unstructured Meshes
In 20th International Conference on Parallel and Distributed Systems (ICPADS 2014). IEEE, 2014.Status: Published
Heterogeneous CPU-GPU Computing for the Finite Volume Method on 3D Unstructured Meshes
A recent trend in modern high-performance computing environments is the introduction of accelerators such as GPU and Xeon Phi, i.e. specialized computing devices that are optimized for highly parallel applications and coexist with CPUs. In regular compute-intensive applications with predictable data access patterns, these devices often outperform traditional CPUs by far and thus relegate them to pure control functions instead of computations. For irregular applications however, the gap in relative performance can be much smaller, and sometimes even reversed. Thus, maximizing overall performance in such systems requires that full use of all available computational resources is made. In this paper we study the attainable performance of the cell-centered finite volume method on 3D unstructured tetrahedral meshes using heterogeneous systems consisting of CPUs and multiple GPUs. Finite volume methods are widely used numerical strategies for solving partial differential equations. The advantages of using finite volumes include built-in support for conservation laws and suitability for unstructured meshes. Our focus lies in demonstrating how a workload distribution that maximizes overall performance can be derived from the actual performance attained by the different computing devices in the heterogeneous environment. We also highlight the dual role of partitioning software in reordering and partitioning the input mesh, thus giving rise to a new combined approach to partitioning.
Afilliation | , , Scientific Computing |
Project(s) | Center for Biomedical Computing (SFF) |
Publication Type | Proceedings, refereed |
Year of Publication | 2014 |
Conference Name | 20th International Conference on Parallel and Distributed Systems (ICPADS 2014) |
Pagination | 191-199 |
Publisher | IEEE |
DOI | 10.1109/PADSW.2014.7097808 |
Utilizing Multiple Xeon Phi Coprocessors on One Compute Node
In International Conference on Algorithms and Architectures for Parallel Processing. Vol. 8631. LNCS 8631. Berlin Heidelberg New York: Springer, 2014.Status: Published
Utilizing Multiple Xeon Phi Coprocessors on One Compute Node
Afilliation | Scientific Computing |
Project(s) | Center for Biomedical Computing (SFF) |
Publication Type | Proceedings, refereed |
Year of Publication | 2014 |
Conference Name | International Conference on Algorithms and Architectures for Parallel Processing |
Volume | 8631 |
Pagination | 68-81 |
Publisher | Springer |
Place Published | Berlin Heidelberg New York |
DOI | 10.1007/978-3-319-11194-0_6 |
Poster
Cellular Arrhythmogenesis in CPVT in a computational model of cardiac ventricular myocyte
Maastrich, Netherlands: European Working Group of Cardiac Cellular Electrophysiology, 2014.Status: Published
Cellular Arrhythmogenesis in CPVT in a computational model of cardiac ventricular myocyte
Afilliation | Scientific Computing, , Scientific Computing |
Project(s) | Center for Biomedical Computing (SFF) |
Publication Type | Poster |
Year of Publication | 2014 |
Date Published | 09/14 |
Publisher | European Working Group of Cardiac Cellular Electrophysiology |
Place Published | Maastrich, Netherlands |
Cellular Arrhythmogenesis in CPVT in a Computational Model of Cardiac Ventricular Myocyte
Scandinavian Physiological Society Meeting, 2014.Status: Published
Cellular Arrhythmogenesis in CPVT in a Computational Model of Cardiac Ventricular Myocyte
Afilliation | , Scientific Computing |
Project(s) | Center for Biomedical Computing (SFF) |
Publication Type | Poster |
Year of Publication | 2014 |
Date Published | 08/2014 |
Place Published | Scandinavian Physiological Society Meeting |
Type of Work | Poster at Scandinavian Physiological Society Meeting |
Spontaneous Ca2+ Release and Ca2+ Waves Underlie Early and Delayed Afterdepolarizations, and Triggered Activity in Ryanodine Receptor Mutation associated with Catecholaminergic Polymorphic Ventricular Tachycardia
Scandinavian Physiological Society, 2014.Status: Published
Spontaneous Ca2+ Release and Ca2+ Waves Underlie Early and Delayed Afterdepolarizations, and Triggered Activity in Ryanodine Receptor Mutation associated with Catecholaminergic Polymorphic Ventricular Tachycardia
Afilliation | Scientific Computing, , Scientific Computing |
Project(s) | Center for Biomedical Computing (SFF) |
Publication Type | Poster |
Year of Publication | 2014 |
Date Published | 08/2014 |
Place Published | Scandinavian Physiological Society |
Spontaneous Ca2+ Release and Ca2+ Waves Underlie Early and Delayed Afterdepolarizations, and Triggered Activity, in Ryanodine Receptor Mutations Associated With Catecholaminergic Polymorphic Ventricular Tachycardia
2014.Status: Published
Spontaneous Ca2+ Release and Ca2+ Waves Underlie Early and Delayed Afterdepolarizations, and Triggered Activity, in Ryanodine Receptor Mutations Associated With Catecholaminergic Polymorphic Ventricular Tachycardia
Afilliation | , Scientific Computing |
Project(s) | Center for Biomedical Computing (SFF) |
Publication Type | Poster |
Year of Publication | 2014 |
Date Published | August |
Keywords | Conference |
Journal Article
High Efficient Sedimentary Basin Simulations on Hybrid CPU-GPU Clusters
Cluster Computing 17 (2014): 359-369.Status: Published
High Efficient Sedimentary Basin Simulations on Hybrid CPU-GPU Clusters
Afilliation | , , Scientific Computing |
Project(s) | Center for Biomedical Computing (SFF) |
Publication Type | Journal Article |
Year of Publication | 2014 |
Journal | Cluster Computing |
Volume | 17 |
Number | 2 |
Pagination | 359-369 |
Publisher | |
DOI | 10.1007/s10586-013-0300-9 |
Performance Modeling of Serial and Parallel Implementations of the Fractional Adams-Bashforth-Moulton Method
Fractional Calculus and Applied Analysis 17 (2014): 617-637.Status: Published
Performance Modeling of Serial and Parallel Implementations of the Fractional Adams-Bashforth-Moulton Method
Afilliation | , , Scientific Computing |
Project(s) | Center for Biomedical Computing (SFF) |
Publication Type | Journal Article |
Year of Publication | 2014 |
Journal | Fractional Calculus and Applied Analysis |
Volume | 17 |
Number | 3 |
Pagination | 617-637 |
Publisher |
Time-Fractional Heat Equations and Negative Absolute Temperatures
Computers & Mathematics with Applications 67 (2014): 164-171.Status: Published
Time-Fractional Heat Equations and Negative Absolute Temperatures
Afilliation | , , Scientific Computing |
Project(s) | Center for Biomedical Computing (SFF) |
Publication Type | Journal Article |
Year of Publication | 2014 |
Journal | Computers & Mathematics with Applications |
Volume | 67 |
Number | 1 |
Pagination | 164-171 |
Publisher | |
DOI | 10.1016/j.camwa.2013.11.007 |
Public outreach
Supercomputing-Enabled Study of Subcellular Calcium Dynamics
2014.Status: Published
Supercomputing-Enabled Study of Subcellular Calcium Dynamics
Afilliation | Scientific Computing, , , Scientific Computing |
Project(s) | Center for Biomedical Computing (SFF) |
Publication Type | Public outreach |
Year of Publication | 2014 |
Type of Work | Article in "meta" - a magazine published by the notur project |
Talks, invited
Adopting Heterogeneous Hardware Platforms for Scientific Computing
In Guest lecture at Technical Unviersity of Denmark, December 5, 2013.Status: Published
Adopting Heterogeneous Hardware Platforms for Scientific Computing
Afilliation | , , Scientific Computing |
Project(s) | Center for Biomedical Computing (SFF) |
Publication Type | Talks, invited |
Year of Publication | 2013 |
Location of Talk | Guest lecture at Technical Unviersity of Denmark, December 5 |
Introduction to Scientific Writing
In Intensive course given at National University of Defence Technology, China, October 17-19, 2013.Status: Published
Introduction to Scientific Writing
Afilliation | , , Scientific Computing |
Project(s) | Center for Biomedical Computing (SFF) |
Publication Type | Talks, invited |
Year of Publication | 2013 |
Location of Talk | Intensive course given at National University of Defence Technology, China, October 17-19 |
Scientific Computing on Accelerator-Based Supercomputers
In Guest lecture at FFI, September 20, 2013.Status: Published
Scientific Computing on Accelerator-Based Supercomputers
Afilliation | , , Scientific Computing |
Project(s) | Center for Biomedical Computing (SFF) |
Publication Type | Talks, invited |
Year of Publication | 2013 |
Location of Talk | Guest lecture at FFI, September 20 |
Journal Article
Balancing Efficiency and Accuracy for Sediment Transport Simulations
Computational Science & Discovery 6 (2013): 015011.Status: Published
Balancing Efficiency and Accuracy for Sediment Transport Simulations
Simulating multi-lithology sediment transport requires numerically solving a fully-coupled system of nonlinear partial differential equations. The most standard approach is to simultaneously update all the unknown fields at each time step. Such a fully-implicit strategy is computationally demanding due to the need of Newton-Raphson iterations, each having to set up and solve a large system of linearized algebraic equations. Fully-explicit numerical schemes that do not solve linear systems are possible to devise, but suffer from lower numerical stability and accuracy. If we count the total number of floating-point operations needed to achieve stable numerical solutions with a prescribed level of accuracy, the fully-implicit approach probably wins over its fully-explicit counterpart. However, the latter may nevertheless win in the overall computation time, because computers achieve higher hardware efficiency for simpler numerical computations. Adding to this competition, there are semi-implicit numerical schemes that lie between the two extremes. This paper has two novel contributions. First, we device a new semi-implicit scheme that has secondorder accuracy in the temporal direction. Second, and more importantly, we propose a simple prediction model for the overall computation time on multicore architectures, applicable to many numerical implementations. Based on performance prediction, appropriate numerical schemes can be chosen by considering accuracy, stability, and computing speed at the same time. Our methodology is tested by numerical experiments modeling the sediment transport in Monterey Bay.
Afilliation | , , Scientific Computing |
Project(s) | Center for Biomedical Computing (SFF) |
Publication Type | Journal Article |
Year of Publication | 2013 |
Journal | Computational Science & Discovery |
Volume | 6 |
Number | 1 |
Pagination | 015011 |
DOI | 10.1088/1749-4699/6/1/015011 |
Resource-Efficient Utilization of CPU/GPU-Based Heterogeneous Supercomputers for Bayesian Phylogenetic Inference
The Journal of Supercomputing 66 (2013): 364-380.Status: Published
Resource-Efficient Utilization of CPU/GPU-Based Heterogeneous Supercomputers for Bayesian Phylogenetic Inference
Afilliation | Scientific Computing, , , Scientific Computing |
Project(s) | Center for Biomedical Computing (SFF) |
Publication Type | Journal Article |
Year of Publication | 2013 |
Journal | The Journal of Supercomputing |
Volume | 66 |
Number | 1 |
Pagination | 364-380 |
DOI | 10.1007/s11227-013-0911-1 |
Simulating Cardiac Electrophysiology in the Era of GPU-Cluster Computing
IEICE Transactions on Information and Systems E96-D (2013): 2587-2595.Status: Published
Simulating Cardiac Electrophysiology in the Era of GPU-Cluster Computing
Afilliation | , Scientific Computing |
Publication Type | Journal Article |
Year of Publication | 2013 |
Journal | IEICE Transactions on Information and Systems |
Volume | E96-D |
Number | 12 |
Pagination | 2587-2595 |
DOI | 10.1587/transinf.E96.D.2587 |
Talks, contributed
Mint: a User-Friendly C-to-CUDA Code Translator
In Talk given at SIAM CSE'13, February 25, 2013.Status: Published
Mint: a User-Friendly C-to-CUDA Code Translator
Aiming at automated source-to-source code translation from C to CUDA, we have developed the Mint framework. Users only need to annotate serial C code with a few compiler directives, specifying host-device data transfers plus the parallelization depth and granularity of loop nests. Mint then generates CUDA code as output, while carrying out on-chip memory optimizations that will greatly benefit 3D stencil computations. Several real-world applications have been ported to GPU using Mint.
Afilliation | , , Scientific Computing |
Project(s) | Center for Biomedical Computing (SFF) |
Publication Type | Talks, contributed |
Year of Publication | 2013 |
Location of Talk | Talk given at SIAM CSE'13, February 25 |
Keywords | Conference |
Proceedings, refereed
On the GPU Performance of 3D Stencil Computations Implemented in OpenCL
In Proceedings of International Supercomputing Conference, ISC 2013. Vol. 7905. Lecture Notes in Computer Science 7905. Berlin Heidelberg New York: Springer, 2013.Status: Published
On the GPU Performance of 3D Stencil Computations Implemented in OpenCL
Afilliation | , Scientific Computing |
Publication Type | Proceedings, refereed |
Year of Publication | 2013 |
Conference Name | Proceedings of International Supercomputing Conference, ISC 2013 |
Volume | 7905 |
Pagination | 125-135 |
Publisher | Springer |
Place Published | Berlin Heidelberg New York |
Keywords | Conference |
DOI | 10.1007/978-3-642-38750-0_10 |
On the GPU Performance of Cell-Centered Finite Volume Method Over Unstructured Tetrahedral Meshes
In Proceedings of the 3rd Workshop on Irregular Applications: Architectures and Algorithms. New York: ACM, 2013.Status: Published
On the GPU Performance of Cell-Centered Finite Volume Method Over Unstructured Tetrahedral Meshes
Afilliation | , Scientific Computing |
Publication Type | Proceedings, refereed |
Year of Publication | 2013 |
Conference Name | Proceedings of the 3rd Workshop on Irregular Applications: Architectures and Algorithms |
Publisher | ACM |
Place Published | New York |
DOI | 10.1145/2535753.2535765 |
On the GPU-CPU Performance Portability of OpenCL for 3D Stencil Computations
In Proceedings of IEEE 19th International Conference on Parallel and Distributed Systems. Los Alamitos, California • Washington • Tokyo: IEEE, 2013.Status: Published
On the GPU-CPU Performance Portability of OpenCL for 3D Stencil Computations
Afilliation | , Scientific Computing |
Publication Type | Proceedings, refereed |
Year of Publication | 2013 |
Conference Name | Proceedings of IEEE 19th International Conference on Parallel and Distributed Systems |
Pagination | 78-85 |
Publisher | IEEE |
Place Published | Los Alamitos, California • Washington • Tokyo |
Keywords | Conference |
DOI | 10.1109/ICPADS.2013.23 |
Performance of Sediment Transport Simulations on NVIDIA's Kepler Architecture
In The International Conference on Computational Science, ICCS 2013. Vol. 18. Procedia Computer Science 18. Elsevier, 2013.Status: Published
Performance of Sediment Transport Simulations on NVIDIA's Kepler Architecture
Afilliation | Scientific Computing |
Project(s) | Center for Biomedical Computing (SFF) |
Publication Type | Proceedings, refereed |
Year of Publication | 2013 |
Conference Name | The International Conference on Computational Science, ICCS 2013 |
Volume | 18 |
Pagination | 1275-1281 |
Publisher | Elsevier |
Keywords | Conference |
DOI | 10.1016/j.procs.2013.05.294 |
Proceedings, refereed
A New Parallel 3D Front Propagation Algorithm for Fast Simulation of Geological Folds
In The International Conference on Computational Science, ICCS 2012. Vol. 9. Procedia Computer Science 9. Amsterdam: ICCS, 2012.Status: Published
A New Parallel 3D Front Propagation Algorithm for Fast Simulation of Geological Folds
We present a novel method for 3D anisotropic front propagation and apply it to the simulation of geological folding. The new iterative algorithm has a simple structure and abundant parallelism, and is easily adapted to multithreaded architectures using OpenMP. Moreover, we have used the automated C-to-CUDA source code translator, Mint, to achieve greatly enhanced computing speed on GPUs. Both OpenMP and CUDA implementations have been tested and benchmarked on several examples of 3D geological folding.
Afilliation | Scientific Computing |
Project(s) | Center for Biomedical Computing (SFF) |
Publication Type | Proceedings, refereed |
Year of Publication | 2012 |
Conference Name | The International Conference on Computational Science, ICCS 2012 |
Volume | 9 |
Pagination | 947-955 |
Publisher | ICCS |
Place Published | Amsterdam |
Keywords | Conference |
DOI | 10.1016/j.procs.2012.04.101 |
Efficient Implementations of the Adams-Bashforth-Moulton Method for Solving Fractional Differential Equations
In Proceedings of FDA'12. : , 2012.Status: Published
Efficient Implementations of the Adams-Bashforth-Moulton Method for Solving Fractional Differential Equations
Afilliation | , Scientific Computing |
Publication Type | Proceedings, refereed |
Year of Publication | 2012 |
Conference Name | Proceedings of FDA'12 |
Publisher | |
Place Published | |
Keywords | Conference |
Using 1000+ GPUs and 10000+ CPUs for Sedimentary Basin Simulations
In Proceedings of IEEE Cluster 2012. Los Alamitos, California • Washington • Tokyo: IEEE, 2012.Status: Published
Using 1000+ GPUs and 10000+ CPUs for Sedimentary Basin Simulations
Afilliation | , Scientific Computing |
Publication Type | Proceedings, refereed |
Year of Publication | 2012 |
Conference Name | Proceedings of IEEE Cluster 2012 |
Pagination | 27-35 |
Publisher | IEEE |
Place Published | Los Alamitos, California • Washington • Tokyo |
Keywords | Conference |
DOI | 10.1109/CLUSTER.2012.37 |
Journal Article
Accelerating a 3D Finite-Difference Earthquake Simulation With a C-to-CUDA Translator
Computing in Science & Engineering 14 (2012): 48-59.Status: Published
Accelerating a 3D Finite-Difference Earthquake Simulation With a C-to-CUDA Translator
Afilliation | Scientific Computing, , , Scientific Computing |
Project(s) | Center for Biomedical Computing (SFF) |
Publication Type | Journal Article |
Year of Publication | 2012 |
Journal | Computing in Science & Engineering |
Volume | 14 |
Number | 3 |
Pagination | 48-59 |
DOI | 10.1109/MCSE.2012.44 |
Talks, invited
Elements of Scientific Computing
In 3-day intensive course given at National University of Defence Technology, China, October 16-18, 2012.Status: Published
Elements of Scientific Computing
Afilliation | , , Scientific Computing, Scientific Computing |
Project(s) | Center for Biomedical Computing (SFF) |
Publication Type | Talks, invited |
Year of Publication | 2012 |
Location of Talk | 3-day intensive course given at National University of Defence Technology, China, October 16-18 |
Scientific Computing Needs Supercomputers, But Also Something Else!
In Invited lecture at National University of Defence Technology, China, March 29, 2012.Status: Published
Scientific Computing Needs Supercomputers, But Also Something Else!
Afilliation | , , Scientific Computing |
Project(s) | Center for Biomedical Computing (SFF) |
Publication Type | Talks, invited |
Year of Publication | 2012 |
Location of Talk | Invited lecture at National University of Defence Technology, China, March 29 |
Public outreach
Simulating Basin Evolution on GPU-Enhanced Hybrid Supercomputers
2012.Status: Published
Simulating Basin Evolution on GPU-Enhanced Hybrid Supercomputers
According to the Top500 list that was published in November 2011, three of the world's five most powerful supercomputers are GPU-enhanced clusters of multicore CPUs. This hardware trend is expected to prevail for the foreseeable future. It is therefore our intention to report here some experiences of using one such cutting-edge GPU-CPU cluster, when applied to simulations of sediment deposition in connection with basin evolution. Our observations are twofold: (1) Simple numerical algorithms are to be favored on homogeneous clusters of CPUs, and even more so on hybrid CPU-GPU clusters. (2) It is possible but challenging to utilize the computing power of both the CPU and GPU sides on a hybrid cluster.
Afilliation | , , Scientific Computing |
Project(s) | Center for Biomedical Computing (SFF) |
Publication Type | Public outreach |
Year of Publication | 2012 |
Talks, contributed
Some Perspectives on High-Performance Computing in the Geosciences
In Computational Geoscience Workshop, Geilo, January 19, 2012.Status: Published
Some Perspectives on High-Performance Computing in the Geosciences
Afilliation | , , Scientific Computing |
Project(s) | Center for Biomedical Computing (SFF) |
Publication Type | Talks, contributed |
Year of Publication | 2012 |
Location of Talk | Computational Geoscience Workshop, Geilo, January 19 |
Understanding the Performance of Stencil-Based Computations on Multicore CPU
In CBC Seminar series, 2012.Status: Published
Understanding the Performance of Stencil-Based Computations on Multicore CPU
Afilliation | Scientific Computing, , , Scientific Computing |
Project(s) | Center for Biomedical Computing (SFF) |
Publication Type | Talks, contributed |
Year of Publication | 2012 |
Location of Talk | CBC Seminar series |
Talks, contributed
A Function-Centric Generic Framework for Parallelization
In Talk at CLS Workshop at UiO on April 13, 2011.Status: Published
A Function-Centric Generic Framework for Parallelization
Afilliation | , , Scientific Computing |
Project(s) | Center for Biomedical Computing (SFF) |
Publication Type | Talks, contributed |
Year of Publication | 2011 |
Location of Talk | Talk at CLS Workshop at UiO on April 13 |
Efficient Computations of Initial-Value Problems Involving Fractional Derivatives
In Talk at the seminar on wave propagation in complex media, November 23, 2011.Status: Published
Efficient Computations of Initial-Value Problems Involving Fractional Derivatives
Afilliation | , , Scientific Computing |
Project(s) | Center for Biomedical Computing (SFF) |
Publication Type | Talks, contributed |
Year of Publication | 2011 |
Location of Talk | Talk at the seminar on wave propagation in complex media, November 23 |
Study of the Computational Efficiency for Different Usages of Pythoning
In Talk at CLS Workshop at UiO on April 13, 2011.Status: Published
Study of the Computational Efficiency for Different Usages of Pythoning
Afilliation | , , Scientific Computing |
Project(s) | Center for Biomedical Computing (SFF) |
Publication Type | Talks, contributed |
Year of Publication | 2011 |
Location of Talk | Talk at CLS Workshop at UiO on April 13 |
Proceedings, refereed
An OpenMP-Enabled Parallel Simulator for Particle Transport in Fluid Flows
In Proceedings of the International Conference on Computational Science, ICCS 2011. Vol. 4. Procedia Computer Science 4. : Elsevier Science, 2011.Status: Published
An OpenMP-Enabled Parallel Simulator for Particle Transport in Fluid Flows
By using C/C++ programming and OpenMP parallelization, we implement a newly developed numerical strategy for simulating particle transport in sparsely particle-laden fluid flows. Due to its highly dynamic property of the chosen numerical framework, the implementation needs to properly handle the moving, merging and splitting of a large number of particle lumps. We show that a careful division of the entire computational work into a set of distinctive tasks not only produces a clearly structured code, but also allows taskwise parallelization through appropriate use of OpenMP compiler directives. The performance of the OpenMP-enabled parallel simulator is tested on representative architectures of multicore-based shared memory, by running a large case of particle transport in a pipe flow. Attention is also given to a number of performance-critical features of the simulator.
Afilliation | , Scientific Computing |
Publication Type | Proceedings, refereed |
Year of Publication | 2011 |
Conference Name | Proceedings of the International Conference on Computational Science, ICCS 2011 |
Volume | 4 |
Pagination | 1475-1484 |
Publisher | Elsevier Science |
Place Published | |
DOI | 10.1016/j.procs.2011.04.160 |
Mint: Realizing CUDA Performance in 3D Stencil Methods With Annotated C
In Proceedings of the 25th International Conference on Supercomputing (ICS'11). ACM Press, 2011.Status: Published
Mint: Realizing CUDA Performance in 3D Stencil Methods With Annotated C
We present Mint, a programming model that enables the non-expert to enjoy the performance benefits of hand coded CUDA without becoming entangled in the details. Mint targets stencil methods, which are an important class of scientific applications. We have implemented the Mint programming model with a source-to-source translator that generates optimized CUDA C from traditional C source. The translator relies on annotations to guide translation at a high level. The set of pragmas is small, and the model is compact and simple. Yet, Mint is able to deliver performance competitive with painstakingly hand-optimized CUDA. We show that, for a set of widely used stencil kernels in two and three dimensions, Mint realized 80% of the performance obtained from aggressively optimized CUDA on the 200 series NVIDIA GPUs. Our optimizations target three dimensional kernels, which present a daunting array of optimizations.
Afilliation | Scientific Computing, , , Scientific Computing |
Project(s) | Center for Biomedical Computing (SFF) |
Publication Type | Proceedings, refereed |
Year of Publication | 2011 |
Conference Name | Proceedings of the 25th International Conference on Supercomputing (ICS'11) |
Pagination | 214-224 |
Publisher | ACM Press |
ISBN Number | 978-1-4503-0102-2 |
DOI | 10.1145/1995896.1995932 |
Talks, invited
Parallel Simulation of Particle Transport Using OpenMP
In Guest lecture at UCSD on January 31, 2011.Status: Published
Parallel Simulation of Particle Transport Using OpenMP
Afilliation | , , Scientific Computing |
Project(s) | Center for Biomedical Computing (SFF) |
Publication Type | Talks, invited |
Year of Publication | 2011 |
Location of Talk | Guest lecture at UCSD on January 31 |
Programming With OpenMP and Mixed MPI-OpenMP
In Invited lecture during USIT's Research Computing Services training week, November 14-17, 2011.Status: Published
Programming With OpenMP and Mixed MPI-OpenMP
Afilliation | , , Scientific Computing |
Project(s) | Center for Biomedical Computing (SFF) |
Publication Type | Talks, invited |
Year of Publication | 2011 |
Location of Talk | Invited lecture during USIT's Research Computing Services training week, November 14-17 |
Programming With OpenMP and Mixed MPI-OpenMP
In Invited lecture at pre-conference workshop of NOTUR 2011, 2011.Status: Published
Programming With OpenMP and Mixed MPI-OpenMP
Afilliation | , , Scientific Computing |
Project(s) | Center for Biomedical Computing (SFF) |
Publication Type | Talks, invited |
Year of Publication | 2011 |
Location of Talk | Invited lecture at pre-conference workshop of NOTUR 2011 |
Journal Article
Stability of Two Time-Integrators for the Aliev-Panfilov System
International Journal of Numerical Analysis and Modeling 8 (2011): 427-442.Status: Published
Stability of Two Time-Integrators for the Aliev-Panfilov System
Afilliation | , , Scientific Computing, Scientific Computing |
Project(s) | Center for Biomedical Computing (SFF) |
Publication Type | Journal Article |
Year of Publication | 2011 |
Journal | International Journal of Numerical Analysis and Modeling |
Volume | 8 |
Number | 3 |
Pagination | 427-442 |
Talks, invited
A Non-Invasive Approach to Parallelizing Sequential Simulators of Partial Differential Equations
In Guest lecture at UCSD on October 28, 2010.Status: Published
A Non-Invasive Approach to Parallelizing Sequential Simulators of Partial Differential Equations
Afilliation | , Scientific Computing |
Publication Type | Talks, invited |
Year of Publication | 2010 |
Location of Talk | Guest lecture at UCSD on October 28 |
Journal Article
Computational Modeling of the Initiation and Development of Spontaneous Intracellular Ca2+ Waves in Ventricular Myocytes
Philosophical Transactions of the Royal Society A 368 (2010): 3953-3965.Status: Published
Computational Modeling of the Initiation and Development of Spontaneous Intracellular Ca2+ Waves in Ventricular Myocytes
Afilliation | , , Scientific Computing |
Project(s) | Center for Biomedical Computing (SFF) |
Publication Type | Journal Article |
Year of Publication | 2010 |
Journal | Philosophical Transactions of the Royal Society A |
Volume | 368 |
Number | 1925 |
Pagination | 3953-3965 |
Date Published | August |
DOI | 10.1098/rsta.2010.0146 |
Simplifying the Parallelization of Scientific Codes by a Function-Centric Approach in Python
Computational Science & Discovery 3 (2010): 015003.Status: Published
Simplifying the Parallelization of Scientific Codes by a Function-Centric Approach in Python
Afilliation | Scientific Computing, , , Scientific Computing |
Project(s) | Center for Biomedical Computing (SFF) |
Publication Type | Journal Article |
Year of Publication | 2010 |
Journal | Computational Science & Discovery |
Volume | 3 |
Pagination | 015003 |
DOI | 10.1088/1749-4699/3/1/015003 |
Talks, contributed
Detailed Numerical Analyses of the Aliev-Panfilov Model on GPGPU
In Talk at PARA2010 Conference, 2010.Status: Published
Detailed Numerical Analyses of the Aliev-Panfilov Model on GPGPU
Afilliation | Scientific Computing, , , Scientific Computing |
Project(s) | Center for Biomedical Computing (SFF) |
Publication Type | Talks, contributed |
Year of Publication | 2010 |
Location of Talk | Talk at PARA2010 Conference |
OpenMP: an Easy Parallel Approach for Scientific Computing on Multi-Core Architecture
In A short course respectively given at Simula in March and University of Oslo in May, 2010.Status: Published
OpenMP: an Easy Parallel Approach for Scientific Computing on Multi-Core Architecture
Afilliation | , Scientific Computing |
Publication Type | Talks, contributed |
Year of Publication | 2010 |
Location of Talk | A short course respectively given at Simula in March and University of Oslo in May |
Optimizing the Aliev-Panfilov Model of Cardiac Excitation on Heterogeneous Systems
In Talk at Para 2010: State of the Art in Scientific and Parallel Computing in Reykjavik on June 6-9, 2010, 2010.Status: Published
Optimizing the Aliev-Panfilov Model of Cardiac Excitation on Heterogeneous Systems
The Aliev-Panfilov model is a simple model for signal propagation in cardiac tissue, and accounts for complex behavior such as how spiral waves break up and form elaborate patterns. Spiral waves can lead to life-threatening situations such as ventricular fibrillation. We discuss an implementation and underlying optimizations for the nVIDIA Tesla C1060 GPU as well as an implementation on multiple GPUs running under MPI. We achieve nearly perfect scaling on 4 GPUs, in single precision, running 58 times faster than a CPU-only implementation and 26 times faster in double precision.
Afilliation | Scientific Computing, , , Scientific Computing |
Project(s) | Center for Biomedical Computing (SFF) |
Publication Type | Talks, contributed |
Year of Publication | 2010 |
Location of Talk | Talk at Para 2010: State of the Art in Scientific and Parallel Computing in Reykjavik on June 6-9, 2010 |
Parallel Programming Using Python
In CBC Seminar on advanced use of Python programming language, 2010.Status: Published
Parallel Programming Using Python
Afilliation | , , Scientific Computing |
Project(s) | Center for Biomedical Computing (SFF) |
Publication Type | Talks, contributed |
Year of Publication | 2010 |
Location of Talk | CBC Seminar on advanced use of Python programming language |
Book
Elements of Scientific Computing
Berlin / Heidelberg: Springer, 2010.Status: Published
Elements of Scientific Computing
Afilliation | Scientific Computing, Scientific Computing, , , Scientific Computing |
Project(s) | Center for Biomedical Computing (SFF) |
Publication Type | Book |
Year of Publication | 2010 |
Date Published | October |
Publisher | Springer |
Place Published | Berlin / Heidelberg |
ISBN Number | 978-3-642-11298-0 |
DOI | 10.1007/978-3-642-1129 |
Proceedings, refereed
Numerical Analysis of a Dual-Sediment Transport Model Applied to Lake Okeechobee, Florida
In Proceedings of the 2010 Ninth International Symposium on Parallel and Distributed Computing. ISPDC '10. IEEE Computer Society Press, 2010.Status: Published
Numerical Analysis of a Dual-Sediment Transport Model Applied to Lake Okeechobee, Florida
In this work, we study two numerical strategies for solving a coupled system of distinct nonlinear partial differential equations, which can be used to model dual-lithology sedimentation. Using high-resolution bathymetry data of Lake Okeechobee, Florida, we study the stability and computational speed of these numerical strategies. The fully-explicit scheme is straightforward to implement and requires a relatively small amount of computation per time step. However, this simple numerical strategy has to use small time steps to ensure stability. These small time steps may render the explicit solver impractical for long-term and high-resolution basin simulations. As a comparison, we have implemented a semi-implicit scheme, where the two partial differential equations at each time step are solved implicitly in sequence. This semi-implicit scheme is numerically stable even for very large time steps. Using parallel computing, we have applied both schemes to a realistic case, Lake Okeechobee, Florida. The simulation successfully diffused material along a river-channel and into the lake. Both MPI-based implementations demonstrated satisfactory parallel efficiency on a multicore-based cluster.
Afilliation | , Scientific Computing, , Scientific Computing |
Project(s) | Center for Biomedical Computing (SFF) |
Publication Type | Proceedings, refereed |
Year of Publication | 2010 |
Conference Name | Proceedings of the 2010 Ninth International Symposium on Parallel and Distributed Computing |
Pagination | 189-194 |
Publisher | IEEE Computer Society Press |
ISBN Number | 978-1-4244-7602-2 |
DOI | 10.1109/ISPDC.2010.29 |
Book Chapter
Parallel Computing Engines for Subsurface Imaging Technologies
In Advanced Computational Infrastructures for Parallel and Distributed Adaptive Applications, 29-43. Wiley Series of Parallel and Distributed Computing. Hoboken, New Jersey: Wiley, 2010.Status: Published
Parallel Computing Engines for Subsurface Imaging Technologies
Afilliation | Scientific Computing, , , Scientific Computing |
Project(s) | Center for Biomedical Computing (SFF) |
Publication Type | Book Chapter |
Year of Publication | 2010 |
Book Title | Advanced Computational Infrastructures for Parallel and Distributed Adaptive Applications |
Secondary Title | Wiley Series of Parallel and Distributed Computing |
Chapter | 3 |
Pagination | 29-43 |
Publisher | Wiley |
Place Published | Hoboken, New Jersey |
ISBN Number | 978-0-470-07294-3 |
Journal Article
A Multilevel Approach for the Satisfiability Problem
ISAST Transactions on Computers and Intelligent Systems 1 (2009): 29-37.Status: Published
A Multilevel Approach for the Satisfiability Problem
Afilliation | , , Scientific Computing |
Project(s) | Center for Biomedical Computing (SFF) |
Publication Type | Journal Article |
Year of Publication | 2009 |
Journal | ISAST Transactions on Computers and Intelligent Systems |
Volume | 1 |
Number | 2 |
Pagination | 29-37 |
A Study on Modified Szabo's Wave Equation Modeling of Frequency-Dependent Dissipation in Ultrasonic Medical Imaging
Physica Scripta 2009 (2009): 014014.Status: Published
A Study on Modified Szabo's Wave Equation Modeling of Frequency-Dependent Dissipation in Ultrasonic Medical Imaging
Afilliation | , , Scientific Computing |
Project(s) | Center for Biomedical Computing (SFF) |
Publication Type | Journal Article |
Year of Publication | 2009 |
Journal | Physica Scripta |
Volume | 2009 |
Number | T136 |
Pagination | 014014 |
DOI | 10.1088/0031-8949/2009/T136/014014 |
Analysis of Tracer Tomography Using Temporal Moments of Tracer Breakthrough Curves
Advances in Water Resources 32 (2009): 391-400.Status: Published
Analysis of Tracer Tomography Using Temporal Moments of Tracer Breakthrough Curves
Afilliation | , , Scientific Computing |
Project(s) | Center for Biomedical Computing (SFF) |
Publication Type | Journal Article |
Year of Publication | 2009 |
Journal | Advances in Water Resources |
Volume | 32 |
Number | 3 |
Pagination | 391-400 |
DOI | 10.1016/j.advwatres.2008.12.001 |
Towards a Computational Method for Imaging the Extracellular Potassium Concentration During Regional Ischemia
Mathematical Biosciences 220 (2009): 118-130.Status: Published
Towards a Computational Method for Imaging the Extracellular Potassium Concentration During Regional Ischemia
Afilliation | Scientific Computing, , Scientific Computing |
Publication Type | Journal Article |
Year of Publication | 2009 |
Journal | Mathematical Biosciences |
Volume | 220 |
Number | 2 |
Pagination | 118-130 |
DOI | 10.1016/j.mbs.2009.05.004 |
Proceedings, refereed
Evolution of Intracellular Ca2+ Waves From About 10,000 RyR Clusters: Towards Solving a Computationally Daunting Task
In The Fifth International Conference on Functional Imaging and Modeling of the Heart. Lecture Notes in Computer Science, vol. 5528. Springer, 2009.Status: Published
Evolution of Intracellular Ca2+ Waves From About 10,000 RyR Clusters: Towards Solving a Computationally Daunting Task
Afilliation | Scientific Computing |
Project(s) | Center for Biomedical Computing (SFF) |
Publication Type | Proceedings, refereed |
Year of Publication | 2009 |
Conference Name | The Fifth International Conference on Functional Imaging and Modeling of the Heart |
Pagination | 11-20 |
Publisher | Springer |
DOI | 10.1007/978-3-642-01932-6 |
Poster
Parallel Simulation of Dual Lithology Sedimentation
2009.Status: Published
Parallel Simulation of Dual Lithology Sedimentation
Afilliation | , , Scientific Computing |
Project(s) | Center for Biomedical Computing (SFF) |
Publication Type | Poster |
Year of Publication | 2009 |
Notes | Second prize winner of the poster competition at the conference. |
Book Chapter
Past and Future Perspectives on Scientific Software
In Simula Research Laboratory - by thinking constantly about it, 321-362. Heidelberg: Springer, 2009.Status: Published
Past and Future Perspectives on Scientific Software
Afilliation | Scientific Computing, , , Scientific Computing |
Project(s) | Center for Biomedical Computing (SFF) |
Publication Type | Book Chapter |
Year of Publication | 2009 |
Book Title | Simula Research Laboratory - by thinking constantly about it |
Chapter | 23 |
Pagination | 321-362 |
Publisher | Springer |
Place Published | Heidelberg |
ISBN Number | 978-3-642-01155-9 |
Book Chapter
A Multilevel Greedy Algorithm for the Satisfiability Problem
In Advances in Greedy Algorithms, 39-54. Vienna: IN-TECH Education and Publishing, 2008.Status: Published
A Multilevel Greedy Algorithm for the Satisfiability Problem
Afilliation | , Scientific Computing |
Project(s) | Center for Biomedical Computing (SFF) |
Publication Type | Book Chapter |
Year of Publication | 2008 |
Book Title | Advances in Greedy Algorithms |
Chapter | 3 |
Pagination | 39-54 |
Publisher | IN-TECH Education and Publishing |
Place Published | Vienna |
ISBN Number | 978-953-7619-27-5 |
Journal Article
A View Toward the Future of Subsurface Characterization: CAT Scanning Groundwater Basins
Water Resources Research 44 (2008).Status: Published
A View Toward the Future of Subsurface Characterization: CAT Scanning Groundwater Basins
In this opinion paper we contend that high-resolution characterization, monitoring, and prediction are the key elements to advancing and reducing uncertainty in our understanding and prediction of subsurface processes at basin scales. First, we advocate that recently developed tomographic surveying is an effective and high-resolution approach for characterizing the field-scale subsurface. Fusion of different types of tomographic surveys further enhances the characterization. A basin is an appropriate scale for many water resources management purposes. We thereby propose the expansion of the tomographic surveying and data fusion concept to basin-scale characterization. In order to facilitate basin-scale tomographic surveys, different types of passive, basin-scale, CAT scan technologies are suggested that exploit recurrent natural stimuli (e.g., lightning, earthquakes, storm events, barometric variations, river-stage variations, etc.) as sources of excitations, along with implementation of sensor networks that provide long-term and spatially distributed monitoring of excitation as well as response signals on the land surface and in the subsurface. This vision for basin-scale subsurface characterization faces many significant technological challenges and requires interdisciplinary collaborations (e.g., surface and subsurface hydrology, geophysics, geology, geochemistry, information and sensor technology, applied mathematics, atmospheric science, etc.). We nevertheless contend that this should be a future direction for subsurface science research.
Afilliation | , Scientific Computing |
Project(s) | Center for Biomedical Computing (SFF) |
Publication Type | Journal Article |
Year of Publication | 2008 |
Journal | Water Resources Research |
Volume | 44 |
Notes | Citation number: W03301 |
DOI | 10.1029/2007WR006375 |
Talks, contributed
High-Performance Computing on Distributed-Memory Architecture
In Lecture at the 2008 Winter School on Parallel Computing, Jan. 20-25, Geilo, Norway, 2008.Status: Published
High-Performance Computing on Distributed-Memory Architecture
Afilliation | , Scientific Computing |
Project(s) | Center for Biomedical Computing (SFF) |
Publication Type | Talks, contributed |
Year of Publication | 2008 |
Location of Talk | Lecture at the 2008 Winter School on Parallel Computing, Jan. 20-25, Geilo, Norway |
Parallel Computing; Why & How?
In Lecture at the 2008 Winter School on Parallel Computing, Jan. 20-25, Geilo, Norway, 2008.Status: Published
Parallel Computing; Why & How?
Afilliation | , Scientific Computing |
Project(s) | Center for Biomedical Computing (SFF) |
Publication Type | Talks, contributed |
Year of Publication | 2008 |
Location of Talk | Lecture at the 2008 Winter School on Parallel Computing, Jan. 20-25, Geilo, Norway |
Resource-Efficient Simulation Of Tsunami Wave Propagation on Parallel Computers
In Invited talk at 2nd Internationsal Symposium for Integrated Predictive Simulation System for Earthequake and Tsunami Disaster, October 21-22, Tokyo, Japan, 2008.Status: Published
Resource-Efficient Simulation Of Tsunami Wave Propagation on Parallel Computers
Afilliation | , Scientific Computing |
Project(s) | Center for Biomedical Computing (SFF) |
Publication Type | Talks, contributed |
Year of Publication | 2008 |
Location of Talk | Invited talk at 2nd Internationsal Symposium for Integrated Predictive Simulation System for Earthequake and Tsunami Disaster, October 21-22, Tokyo, Japan |
Simulation of Tsunami Propagation
In Talk at the 2nd eScience Meeting, Jan. 21-22, Geilo, Norway, 2008.Status: Published
Simulation of Tsunami Propagation
Afilliation | , Scientific Computing |
Project(s) | Center for Biomedical Computing (SFF) |
Publication Type | Talks, contributed |
Year of Publication | 2008 |
Location of Talk | Talk at the 2nd eScience Meeting, Jan. 21-22, Geilo, Norway |
Use of Advanced Computing in Tomographic Surveys
In Talk at PARA 2008, May 13-16, Trondheim, Norway, 2008.Status: Published
Use of Advanced Computing in Tomographic Surveys
Afilliation | , Scientific Computing |
Project(s) | Center for Biomedical Computing (SFF) |
Publication Type | Talks, contributed |
Year of Publication | 2008 |
Location of Talk | Talk at PARA 2008, May 13-16, Trondheim, Norway |
Proceedings, refereed
On the Efficiency of Python for High-Performance Computing: a Case Study Involving Stencil Updates for Partial Differential Equations
In Modeling, Simulation and Optimization of Complex Processes. LNCSE. Springer, 2008.Status: Published
On the Efficiency of Python for High-Performance Computing: a Case Study Involving Stencil Updates for Partial Differential Equations
Afilliation | Scientific Computing, , Scientific Computing |
Project(s) | Center for Biomedical Computing (SFF) |
Publication Type | Proceedings, refereed |
Year of Publication | 2008 |
Conference Name | Modeling, Simulation and Optimization of Complex Processes |
Pagination | 337-358 |
Publisher | Springer |
ISBN Number | 978-3-540-23027-4 |
Edited books
Quantitative Information Fusion for Hydrological Sciences
Vol. 79 in Studies in Computational Intelligence. Springer, 2008.Status: Published
Quantitative Information Fusion for Hydrological Sciences
In a rapidly evolving world of knowledge and technology, do you ever wonder how hydrology is catching up? This book takes the angle of computational hydrology and envisions one of the future directions, namely, quantitative integration of high-quality hydrologic field data with geologic, hydrologic, chemical, atmospheric, and biological information to characterize and predict natural systems in hydrological sciences. Intelligent computation and information fusion are the key words. The aim is to provide both established scientists and graduate students with a summary of recent developments in this topic. The chapters of this edited volume cover some of the most important ingredients for quantitative hydrological information fusion, including data fusion techniques, interactive computational environments, and supporting mathematical and numerical methods. Real-life applications of hydrological information fusion are also addressed.
Afilliation | , Scientific Computing |
Project(s) | Center for Biomedical Computing (SFF) |
Publication Type | Edited books |
Year of Publication | 2008 |
Volume | 79 in Studies in Computational Intelligence |
Date Published | January, 2008 |
Publisher | Springer |
ISBN Number | 978-3-540-75383-4 |
Journal Article
A Note on the Efficiency of the Conjugate Gradient Method for a Class of Time-Dependent Problems
Numerical Linear Algebra with Applications 14 (2007): 459-467.Status: Published
A Note on the Efficiency of the Conjugate Gradient Method for a Class of Time-Dependent Problems
Afilliation | , Scientific Computing, Scientific Computing |
Project(s) | Center for Biomedical Computing (SFF) |
Publication Type | Journal Article |
Year of Publication | 2007 |
Journal | Numerical Linear Algebra with Applications |
Volume | 14 |
Number | 5 |
Pagination | 459-467 |
A Unified Framework of Multi-Objective Cost Functions for Partitioning Unstructured Finite Element Meshes
Applied Mathematical Modelling 31 (2007): 1711-1728.Status: Published
A Unified Framework of Multi-Objective Cost Functions for Partitioning Unstructured Finite Element Meshes
Afilliation | , Scientific Computing |
Project(s) | Center for Biomedical Computing (SFF) |
Publication Type | Journal Article |
Year of Publication | 2007 |
Journal | Applied Mathematical Modelling |
Volume | 31 |
Number | 9 |
Pagination | 1711-1728 |
An Order Optimal Solver for the Discretized Bidomain Equations
Numerical Linear Algebra with Applications 14 (2007): 83-98.Status: Published
An Order Optimal Solver for the Discretized Bidomain Equations
Afilliation | Scientific Computing, , Scientific Computing, Scientific Computing |
Project(s) | Center for Biomedical Computing (SFF) |
Publication Type | Journal Article |
Year of Publication | 2007 |
Journal | Numerical Linear Algebra with Applications |
Volume | 14 |
Number | 2 |
Pagination | 83-98 |
On the Possibility for Computing the Transmembrane Potential in the Heart With a One Shot Method; an Inverse Problem
Mathematical Biosciences 210 (2007): 523-553.Status: Published
On the Possibility for Computing the Transmembrane Potential in the Heart With a One Shot Method; an Inverse Problem
Afilliation | , Scientific Computing, Scientific Computing |
Project(s) | Center for Biomedical Computing (SFF) |
Publication Type | Journal Article |
Year of Publication | 2007 |
Journal | Mathematical Biosciences |
Volume | 210 |
Number | 2 |
Pagination | 523-553 |
Talks, contributed
Bridging the Gap Between Computational Scientists and HPC
In Article published in Meta, Number 3, 2007.Status: Published
Bridging the Gap Between Computational Scientists and HPC
Afilliation | , Scientific Computing |
Project(s) | Center for Biomedical Computing (SFF) |
Publication Type | Talks, contributed |
Year of Publication | 2007 |
Location of Talk | Article published in Meta, Number 3 |
Building Hybrid Parallel PDE Software by Domain Decomposition and Object-Oriented Programming
In Talk at the ICCM 2007 Conference, April 4-6, Hiroshima, Japan, 2007.Status: Published
Building Hybrid Parallel PDE Software by Domain Decomposition and Object-Oriented Programming
Afilliation | , Scientific Computing |
Project(s) | Center for Biomedical Computing (SFF) |
Publication Type | Talks, contributed |
Year of Publication | 2007 |
Location of Talk | Talk at the ICCM 2007 Conference, April 4-6, Hiroshima, Japan |
Making Parallel PDE Software by Object-Oriented Programming
In Guest lecture given at Hohai University, China, May 17, 2007.Status: Published
Making Parallel PDE Software by Object-Oriented Programming
Afilliation | , Scientific Computing |
Project(s) | Center for Biomedical Computing (SFF) |
Publication Type | Talks, contributed |
Year of Publication | 2007 |
Location of Talk | Guest lecture given at Hohai University, China, May 17 |
On a Future Software Platform for Demanding Multi-Scale and Multi-Physics Problems
In Talk at SIAM CSE07 Conference, Costa Mesa, CA, Feb. 19-23, 2007.Status: Published
On a Future Software Platform for Demanding Multi-Scale and Multi-Physics Problems
Afilliation | Scientific Computing, , Scientific Computing |
Project(s) | Center for Biomedical Computing (SFF) |
Publication Type | Talks, contributed |
Year of Publication | 2007 |
Location of Talk | Talk at SIAM CSE07 Conference, Costa Mesa, CA, Feb. 19-23 |
On Building Parallel Algorithms and Software for Hydraulic Tomography
In Talk at SIAM GS2007 Conference, March 19-22, Santa Fe, New Mexico, USA, 2007.Status: Published
On Building Parallel Algorithms and Software for Hydraulic Tomography
Afilliation | , Scientific Computing |
Project(s) | Center for Biomedical Computing (SFF) |
Publication Type | Talks, contributed |
Year of Publication | 2007 |
Location of Talk | Talk at SIAM GS2007 Conference, March 19-22, Santa Fe, New Mexico, USA |
Parallelisation and Numerical Performance of a 3D Model for Coupled Deformation, Fluid Flow, and Heat Transport in Porous Geological Formations
In Talk at the Fourth National Conference on Computational Mechanics (MekIT'07), Trondheim, Norway, 2007.Status: Published
Parallelisation and Numerical Performance of a 3D Model for Coupled Deformation, Fluid Flow, and Heat Transport in Porous Geological Formations
In this paper, we present some parallel performance results for a 3D simulator of coupled deformation, fluid flow and heat transfer in sedimentary basins. The model parameters are derived from an industry simulator, with realistic material properties and complex irregular grids of up to 1.5 million nodes with 7.3 million degrees of freedom. We have performed parallelisation on the linear algebra level using the ML algebraic multigrid preconditioner with iterative methods in the Diffpack finite element framework. Implementation and speedup results are presented.
Afilliation | Scientific Computing, , Scientific Computing, Scientific Computing |
Project(s) | Center for Biomedical Computing (SFF) |
Publication Type | Talks, contributed |
Year of Publication | 2007 |
Location of Talk | Talk at the Fourth National Conference on Computational Mechanics (MekIT'07), Trondheim, Norway |
Notes | Presented by J. B. Haga |
Simulating Tsunami Propagation on Parallel Computers Using a Hybrid Software Framework
In Guest lecture given at the University of Stuttgart, March 12, 2007.Status: Published
Simulating Tsunami Propagation on Parallel Computers Using a Hybrid Software Framework
Afilliation | , Scientific Computing |
Project(s) | Center for Biomedical Computing (SFF) |
Publication Type | Talks, contributed |
Year of Publication | 2007 |
Location of Talk | Guest lecture given at the University of Stuttgart, March 12 |
Proceedings, refereed
Making Hybrid Tsunami Simulators in a Parallel Software Framework
In International Workshop on Applied Parallel Computing (PARA'06). Lecture Notes in Computer Science, volume 4699. Berlin Heidelberg: Springer Verlag, 2007.Status: Published
Making Hybrid Tsunami Simulators in a Parallel Software Framework
Afilliation | Scientific Computing |
Project(s) | Center for Biomedical Computing (SFF) |
Publication Type | Proceedings, refereed |
Year of Publication | 2007 |
Conference Name | International Workshop on Applied Parallel Computing (PARA'06) |
Pagination | 686-693 |
Publisher | Springer Verlag |
Place Published | Berlin Heidelberg |
ISBN Number | 978-3-540-75754-2 |
Parallelisation and Numerical Performance of a 3D Model for Coupled Deformation, Fluid Flow and Heat Transfer in Sedimentary Basins
In MekIT'07. Fourth National Conference on Computational Mechanics. Trondheim: Tapir Academic Press, 2007.Status: Published
Parallelisation and Numerical Performance of a 3D Model for Coupled Deformation, Fluid Flow and Heat Transfer in Sedimentary Basins
In this paper, we present some parallel performance results for a 3D simulator of coupled deformation, fluid flow and heat transfer in sedimentary basins. The model parameters are derived from an industry simulator, with realistic material properties and complex irregular grids of up to 1.5 million nodes with 7.3 million degrees of freedom. We have performed parallelisation on the linear algebra level using the ML algebraic multigrid preconditioner with iterative methods in the Diffpack finite element framework. Implementation and speedup results are presented.
Afilliation | Scientific Computing |
Project(s) | Center for Biomedical Computing (SFF) |
Publication Type | Proceedings, refereed |
Year of Publication | 2007 |
Conference Name | MekIT'07. Fourth National Conference on Computational Mechanics |
Pagination | 151-162 |
Date Published | May |
Publisher | Tapir Academic Press |
Place Published | Trondheim |
ISBN Number | 978-82-519-2235-7 |
Talks, contributed
A Hybrid Software Framework for Parallel Tsunami Simulations
In Talk at SIAM PP06 Conference, February 22-24, 2006, San Francisco, 2006.Status: Published
A Hybrid Software Framework for Parallel Tsunami Simulations
Publication Type | Talks, contributed |
Year of Publication | 2006 |
Location of Talk | Talk at SIAM PP06 Conference, February 22-24, 2006, San Francisco |
Computational Issues in Heart Modeling
In Presented at the Johann Radon Institute for Computational and Applied Mathematics, Linz, Austria, 2006.Status: Published
Computational Issues in Heart Modeling
Afilliation | Scientific Computing, Scientific Computing, Scientific Computing |
Publication Type | Talks, contributed |
Year of Publication | 2006 |
Location of Talk | Presented at the Johann Radon Institute for Computational and Applied Mathematics, Linz, Austria |
Fusion of Hydraulic and Tracer Tomography for DNAPL Detection
In Poster presented at AGU Fall Meeting 2006, Dec. 11-15, San Francisco, 2006.Status: Published
Fusion of Hydraulic and Tracer Tomography for DNAPL Detection
Publication Type | Talks, contributed |
Year of Publication | 2006 |
Location of Talk | Poster presented at AGU Fall Meeting 2006, Dec. 11-15, San Francisco |
Hybrid Parallelization of a 3D Transient Hydraulic Tomography Code
In Poster presented at Western Pacific Geophysics Meeting 2006, Beijng, July 24-27, 2006.Status: Published
Hybrid Parallelization of a 3D Transient Hydraulic Tomography Code
Publication Type | Talks, contributed |
Year of Publication | 2006 |
Location of Talk | Poster presented at Western Pacific Geophysics Meeting 2006, Beijng, July 24-27 |
On the Use of the Bidomain Equations for Computing the Transmembrane Potential Throughout the Heart Wall: an Inverse Problem
In Presented at the Computers in Cardiology conference in Valencia, Spain, 2006.Status: Published
On the Use of the Bidomain Equations for Computing the Transmembrane Potential Throughout the Heart Wall: an Inverse Problem
Afilliation | Scientific Computing, Scientific Computing, Scientific Computing |
Publication Type | Talks, contributed |
Year of Publication | 2006 |
Location of Talk | Presented at the Computers in Cardiology conference in Valencia, Spain |
Parallel Computational Methodology for Hydraulic Tomography
In Poster presented at AGU Fall Meeting 2006, San Francisco, Dec. 11-15, 2006.Status: Published
Parallel Computational Methodology for Hydraulic Tomography
Publication Type | Talks, contributed |
Year of Publication | 2006 |
Location of Talk | Poster presented at AGU Fall Meeting 2006, San Francisco, Dec. 11-15 |
Parallel Programming and Computing for Large-Scale Hydraulic Tomography
In Poster presented at Workshop on Hydraulic Tomography, Boise, June 8-9, 2006.Status: Published
Parallel Programming and Computing for Large-Scale Hydraulic Tomography
Publication Type | Talks, contributed |
Year of Publication | 2006 |
Location of Talk | Poster presented at Workshop on Hydraulic Tomography, Boise, June 8-9 |
Parallelizing Serial PDE Software Using a Generic Approach
In Seminar at the University of Arizona, February 27, 2006.Status: Published
Parallelizing Serial PDE Software Using a Generic Approach
Publication Type | Talks, contributed |
Year of Publication | 2006 |
Location of Talk | Seminar at the University of Arizona, February 27 |
Python in High Performance Computing
In Tutorial presented at the Para06 Workshop, 2006.Status: Published
Python in High Performance Computing
Afilliation | Scientific Computing, Scientific Computing, Scientific Computing |
Publication Type | Talks, contributed |
Year of Publication | 2006 |
Location of Talk | Tutorial presented at the Para06 Workshop |
Simulating Tsunamis on Parallel Computers
In Invited talk at Notur 2006 Conference, May 11-12, Bergen, Norway, 2006.Status: Published
Simulating Tsunamis on Parallel Computers
Publication Type | Talks, contributed |
Year of Publication | 2006 |
Location of Talk | Invited talk at Notur 2006 Conference, May 11-12, Bergen, Norway |
Book
Computing the Electrical Activity in the Heart
Berlin Heidelberg: Springer, 2006.Status: Published
Computing the Electrical Activity in the Heart
This book describes mathematical models and numerical techniques for simulating the electrical activity in the heart. The book gives an introduction to the most important models of the field, followed by a detailed description of numerical techniques for the models. Particular focus is on efficient numerical methods for large scale simulations on both scalar and parallel computers. The results presented in the book will be of particular interest to researchers in bioengineering and computational biology, who face the challenge of solving these complex mathematical models efficiently. The book will also serve as a valuable introduction to a new and exciting field for computational scientists and applied mathematicians.
Afilliation | Scientific Computing, Scientific Computing, Scientific Computing |
Publication Type | Book |
Year of Publication | 2006 |
Publisher | Springer |
Place Published | Berlin Heidelberg |
ISBN Number | 3-540-33432-7 |
Book Chapter
Full-Scale Simulation of Cardiac Electrophysiology on Parallel Computers
In Numerical Solution of Partial Differential Equations on Parallel Computers, 385-411. Lecture Notes in Computational Science and Engineering. Springer, 2006.Status: Published
Full-Scale Simulation of Cardiac Electrophysiology on Parallel Computers
Afilliation | Scientific Computing, Scientific Computing |
Publication Type | Book Chapter |
Year of Publication | 2006 |
Book Title | Numerical Solution of Partial Differential Equations on Parallel Computers |
Secondary Title | Lecture Notes in Computational Science and Engineering |
Pagination | 385-411 |
Publisher | Springer |
Parallelizing PDE Solvers Using the Python Programming Language
In Numerical Solution of Partial Differential Equations on Parallel Computers, 295-325. Lecture Notes in Computational Science and Engineering. Springer, 2006.Status: Published
Parallelizing PDE Solvers Using the Python Programming Language
Afilliation | Scientific Computing, Scientific Computing |
Publication Type | Book Chapter |
Year of Publication | 2006 |
Book Title | Numerical Solution of Partial Differential Equations on Parallel Computers |
Secondary Title | Lecture Notes in Computational Science and Engineering |
Pagination | 295-325 |
Publisher | Springer |
Proceedings, non-refereed
Identifying Ischemic Heart Disease in Terms of ECG Recordings and an Inverse Problem for the Bidomain Equations; Modeling and Experiments
In The Third International Conference "Inverse Problems: Modeling and Simulation". Literatür Yayincilik Ltd, 2006.Status: Published
Identifying Ischemic Heart Disease in Terms of ECG Recordings and an Inverse Problem for the Bidomain Equations; Modeling and Experiments
Afilliation | Scientific Computing, Scientific Computing, Scientific Computing |
Publication Type | Proceedings, non-refereed |
Year of Publication | 2006 |
Conference Name | The Third International Conference "Inverse Problems: Modeling and Simulation" |
Pagination | 138-140 |
Publisher | Literatür Yayincilik Ltd. |
ISBN Number | 975-04-0381-9 |
Proceedings, refereed
Improving the Performance of Large-Scale Unstructured PDE Applications
In Proceedings of the PARA'04 Workshop, June 20-23, 2004, Lyngby, Denmark. Lecture Notes in Computer Science, volume 3732. Springer, 2006.Status: Published
Improving the Performance of Large-Scale Unstructured PDE Applications
Publication Type | Proceedings, refereed |
Year of Publication | 2006 |
Conference Name | Proceedings of the PARA'04 Workshop, June 20-23, 2004, Lyngby, Denmark |
Pagination | 699-708 |
Publisher | Springer |
ISBN Number | 3-540-29067-2 |
On the Use of the Bidomain Equations for Computing the Transmembrane Potential Throughout the Heart Wall: an Inverse Problem
In Computers in Cardiology 2006. Computers in Cardiology, 2006.Status: Published
On the Use of the Bidomain Equations for Computing the Transmembrane Potential Throughout the Heart Wall: an Inverse Problem
Afilliation | Scientific Computing, Scientific Computing, Scientific Computing |
Publication Type | Proceedings, refereed |
Year of Publication | 2006 |
Conference Name | Computers in Cardiology 2006 |
Pagination | 797-800 |
Publisher | Computers in Cardiology |
ISBN Number | 0276-6547 |
Notes | ISSN 0276-6547 |
Parallel Simulation of Tsunamis Using a Hybrid Software Approach
In Proceedings of the International Conference ParCo 2005, September 13-16, Malaga, Spain. Volume 33 in NIC series. John von Neumann Institute for Computing, 2006.Status: Published
Parallel Simulation of Tsunamis Using a Hybrid Software Approach
Afilliation | Scientific Computing, Scientific Computing |
Publication Type | Proceedings, refereed |
Year of Publication | 2006 |
Conference Name | Proceedings of the International Conference ParCo 2005, September 13-16, Malaga, Spain |
Pagination | 383-390 |
Publisher | John von Neumann Institute for Computing |
ISBN Number | 3-00-017352-8 |
Journal Article
On the Computational Complexity of the Bidomain and the Monodomain Models of Electrophysiology
Annals of Biomedical Engineering 34 (2006): 1088-1097.Status: Published
On the Computational Complexity of the Bidomain and the Monodomain Models of Electrophysiology
Afilliation | Scientific Computing, Scientific Computing, Scientific Computing |
Publication Type | Journal Article |
Year of Publication | 2006 |
Journal | Annals of Biomedical Engineering |
Volume | 34 |
Number | 7 |
Pagination | 1088-1097 |
Date Published | July |
Journal Article
A Numerical Method for Computing the Profile of Weld Pool Surfaces
International Journal for Computational Methods in Engineering Science and Mechanics 6 (2005): 115-125.Status: Published
A Numerical Method for Computing the Profile of Weld Pool Surfaces
Afilliation | Scientific Computing, Scientific Computing |
Publication Type | Journal Article |
Year of Publication | 2005 |
Journal | International Journal for Computational Methods in Engineering Science and Mechanics |
Volume | 6 |
Number | 2 |
Pagination | 115-125 |
A Parallel Multi-Subdomain Strategy for Solving Boussinesq Water Wave Equations
Advances in Water Resources 28 (2005): 215-233.Status: Published
A Parallel Multi-Subdomain Strategy for Solving Boussinesq Water Wave Equations
Afilliation | Scientific Computing, Scientific Computing |
Publication Type | Journal Article |
Year of Publication | 2005 |
Journal | Advances in Water Resources |
Volume | 28 |
Number | 3 |
Pagination | 215-233 |
Date Published | March |
On the Performance of the Python Programming Language for Serial and Parallel Scientific Computations
Scientific Programming 13 (2005): 31-56.Status: Published
On the Performance of the Python Programming Language for Serial and Parallel Scientific Computations
Afilliation | Scientific Computing, Scientific Computing |
Publication Type | Journal Article |
Year of Publication | 2005 |
Journal | Scientific Programming |
Volume | 13 |
Number | 1 |
Pagination | 31-56 |
Technical reports
An Order Optimal Solver for the Discretized Bidomain Equations
Simula Research Laboratory, 2005.Status: Published
An Order Optimal Solver for the Discretized Bidomain Equations
Afilliation | Scientific Computing |
Project(s) | No Simula project |
Publication Type | Technical reports |
Year of Publication | 2005 |
Publisher | Simula Research Laboratory |
Notes | This technical report is an earlier version of a journal article. The journal article can be found here: https://www.simula.no/publications/order-optimal-solver-discretized-bidomain-equations |
Talks, contributed
Parallel Simulation of Tsunamis Using a Hybrid Software Approach
In Talk at ParCo 2005 Conference, 13 - 16 September, Malaga, Spain, 2005.Status: Published
Parallel Simulation of Tsunamis Using a Hybrid Software Approach
Afilliation | Scientific Computing, Scientific Computing |
Publication Type | Talks, contributed |
Year of Publication | 2005 |
Location of Talk | Talk at ParCo 2005 Conference, 13 - 16 September, Malaga, Spain |
Parallelization of PDE Codes
In Talk at the CMA Workshop on High-Performance Computing in Physics, November 4, Oslo, Norway, 2005.Status: Published
Parallelization of PDE Codes
Afilliation | Scientific Computing, Scientific Computing |
Publication Type | Talks, contributed |
Year of Publication | 2005 |
Location of Talk | Talk at the CMA Workshop on High-Performance Computing in Physics, November 4, Oslo, Norway |
Solving Boussinesq Water Wave Equations on Parallel Computers
In Talk at the International Workshop on Numerical Ocean Modeling, Oslo, Norway, 2005.Status: Published
Solving Boussinesq Water Wave Equations on Parallel Computers
Afilliation | Scientific Computing, Scientific Computing |
Publication Type | Talks, contributed |
Year of Publication | 2005 |
Location of Talk | Talk at the International Workshop on Numerical Ocean Modeling, Oslo, Norway |
Book Chapter
A Numerical Study of Some Parallel Algebraic Preconditioners
In Parallel and Distributed Scientific and Engineering Computing: Practice and Experience, 9-21. Nova Science Publishers, 2004.Status: Published
A Numerical Study of Some Parallel Algebraic Preconditioners
Publication Type | Book Chapter |
Year of Publication | 2004 |
Book Title | Parallel and Distributed Scientific and Engineering Computing: Practice and Experience |
Pagination | 9-21 |
Publisher | Nova Science Publishers |
Notes | An eariler version is included in Proceedings of the IPDPS 2003 Conference, Nice, France, April 2003, IEEE Computer Society |
Parallel Solution of the Bidomain Equations With High Resolutions
In Parallel Computing: Software Technology, Algorithms, Architectures & Applications, 837-844. Elsevier Science, 2004.Status: Published
Parallel Solution of the Bidomain Equations With High Resolutions
Afilliation | Scientific Computing, Scientific Computing |
Publication Type | Book Chapter |
Year of Publication | 2004 |
Book Title | Parallel Computing: Software Technology, Algorithms, Architectures & Applications |
Pagination | 837-844 |
Publisher | Elsevier Science |
Talks, contributed
Using Linux Clusters for Full-Scale Simulation of Cardiac Electrophysiology
In Invited talk at the fifth annual workshop on Linux Clusters for Super Computing, October 18-21, 2004, Linköping, Sweden, 2004.Status: Published
Using Linux Clusters for Full-Scale Simulation of Cardiac Electrophysiology
Publication Type | Talks, contributed |
Year of Publication | 2004 |
Location of Talk | Invited talk at the fifth annual workshop on Linux Clusters for Super Computing, October 18-21, 2004, Linköping, Sweden |
Journal Article
Using the Parallel Algebraic Recursive Multilevel Solver in Modern Physical Applications
Future Generation Computer Systems 20 (2004): 489-500.Status: Published
Using the Parallel Algebraic Recursive Multilevel Solver in Modern Physical Applications
Publication Type | Journal Article |
Year of Publication | 2004 |
Journal | Future Generation Computer Systems |
Volume | 20 |
Number | 3 |
Pagination | 489-500 |
Notes | An earlier version appeared as Technical Report 2002-106 at the Minnesota Supercomputing Institute |
Proceedings, refereed
A Flexible Architecture for Welding Simulators Used in Weld Planning
In Proceedings of International Conference on Productive Welding in Industrial Applications. Lappenranta, Finland, 2003.Status: Published
A Flexible Architecture for Welding Simulators Used in Weld Planning
Afilliation | Scientific Computing, Scientific Computing |
Publication Type | Proceedings, refereed |
Year of Publication | 2003 |
Conference Name | Proceedings of International Conference on Productive Welding in Industrial Applications |
Date Published | May |
Place Published | Lappenranta, Finland |
Talks, contributed
A Numerical Study of Some Parallel Algebraic Preconditioners
In Talk at the IPDPS 2003 Conference, April 22-26, 2003, Nice, France, 2003.Status: Published
A Numerical Study of Some Parallel Algebraic Preconditioners
Publication Type | Talks, contributed |
Year of Publication | 2003 |
Location of Talk | Talk at the IPDPS 2003 Conference, April 22-26, 2003, Nice, France |
Computing the Electrical Activity in the Human Heart
In Presented at the European Conference on Numerical Mathematics and Advanced Applications, Prague, Czech Republic, 2003.Status: Published
Computing the Electrical Activity in the Human Heart
Afilliation | Scientific Computing, Scientific Computing, Scientific Computing |
Publication Type | Talks, contributed |
Year of Publication | 2003 |
Location of Talk | Presented at the European Conference on Numerical Mathematics and Advanced Applications, Prague, Czech Republic |
Computing the Electrical Activity in the Human Heart
In Presented at the Centre of Mathematics for Applications, Oslo, 2003.Status: Published
Computing the Electrical Activity in the Human Heart
Afilliation | Scientific Computing, Scientific Computing, Scientific Computing |
Publication Type | Talks, contributed |
Year of Publication | 2003 |
Location of Talk | Presented at the Centre of Mathematics for Applications, Oslo |
Computing the Heart
In Presented at the 21st CAD-FEM users' meeting 2003 - International congress on FEM technology, Potsdam, Germany, 2003.Status: Published
Computing the Heart
Afilliation | Scientific Computing, Scientific Computing, Scientific Computing |
Publication Type | Talks, contributed |
Year of Publication | 2003 |
Location of Talk | Presented at the 21st CAD-FEM users' meeting 2003 - International congress on FEM technology, Potsdam, Germany |
Mathematical and Numerical Modeling of Medical Ultrasound Wave Propagation
In Invited talk to MACSI-Workshop for Numerical Simulations for Ultrasound Imaging and Inversion, St. Georgen, Austria, pages 8-13, 2003.Status: Published
Mathematical and Numerical Modeling of Medical Ultrasound Wave Propagation
Afilliation | Scientific Computing, Scientific Computing |
Publication Type | Talks, contributed |
Year of Publication | 2003 |
Location of Talk | Invited talk to MACSI-Workshop for Numerical Simulations for Ultrasound Imaging and Inversion, St. Georgen, Austria, pages 8-13 |
Parallel Algorithms for Simulating the Electrical Activity of the Heart
In Presented at the Dagstuhl seminar Challenges in computational science and engineering, 2003.Status: Published
Parallel Algorithms for Simulating the Electrical Activity of the Heart
Afilliation | Scientific Computing, Scientific Computing |
Publication Type | Talks, contributed |
Year of Publication | 2003 |
Location of Talk | Presented at the Dagstuhl seminar Challenges in computational science and engineering |
Notes | Presented by Joakim Sundnes, March 2003. |
Toward Extremely High-Resolution Simulation of Human Heart
In Talk at the ParCo 2003 Conference, 2 - 5 September 2003, Dresden, Germany, 2003.Status: Published
Toward Extremely High-Resolution Simulation of Human Heart
Afilliation | Scientific Computing, Scientific Computing |
Publication Type | Talks, contributed |
Year of Publication | 2003 |
Location of Talk | Talk at the ParCo 2003 Conference, 2 - 5 September 2003, Dresden, Germany |
Book Chapter
Overlapping Domain Decomposition Methods
In Advanced Topics in Computational Partial Differential Equations - Numerical Methods and Diffpack Programming, 57-95. Springer, 2003.Status: Published
Overlapping Domain Decomposition Methods
Publication Type | Book Chapter |
Year of Publication | 2003 |
Book Title | Advanced Topics in Computational Partial Differential Equations - Numerical Methods and Diffpack Programming |
Pagination | 57-95 |
Publisher | Springer |
Parallel Computing
In Advanced Topics in Computational Partial Differential Equations - Numerical Methods and Diffpack Programming, 1-55. Lecture Notes in Computational Science and Engineering. Springer, 2003.Status: Published
Parallel Computing
Afilliation | Scientific Computing, Scientific Computing, Scientific Computing |
Publication Type | Book Chapter |
Year of Publication | 2003 |
Book Title | Advanced Topics in Computational Partial Differential Equations - Numerical Methods and Diffpack Programming |
Secondary Title | Lecture Notes in Computational Science and Engineering |
Pagination | 1-55 |
Publisher | Springer |
Performance Modeling of PDE Solvers
In Advanced Topics in Computational Partial Differential Equations - Numerical Methods and Diffpack Programming, 361-399. Lecture Notes in Computational Science and Engineering. Springer, 2003.Status: Published
Performance Modeling of PDE Solvers
Afilliation | Scientific Computing, Scientific Computing, Scientific Computing |
Publication Type | Book Chapter |
Year of Publication | 2003 |
Book Title | Advanced Topics in Computational Partial Differential Equations - Numerical Methods and Diffpack Programming |
Secondary Title | Lecture Notes in Computational Science and Engineering |
Pagination | 361-399 |
Publisher | Springer |
Talks, contributed
Developing Parallel Object-Oriented Simulation Codes in Diffpack
In Invited talk at the Fifth World Congress on Computational Mechanics, Vienna, Austria, 2002.Status: Published
Developing Parallel Object-Oriented Simulation Codes in Diffpack
Afilliation | Scientific Computing, Scientific Computing |
Publication Type | Talks, contributed |
Year of Publication | 2002 |
Location of Talk | Invited talk at the Fifth World Congress on Computational Mechanics, Vienna, Austria |
Notes | Presented by X. Cai |
Diffpack Simulation of the Electrical Activity in the Heart
In Invited minisymposium talk at the 20th CAD-FEM User's Meeting, Friedrichshafen, Germany, 2002.Status: Published
Diffpack Simulation of the Electrical Activity in the Heart
Afilliation | Scientific Computing, Scientific Computing, Scientific Computing |
Publication Type | Talks, contributed |
Year of Publication | 2002 |
Location of Talk | Invited minisymposium talk at the 20th CAD-FEM User's Meeting, Friedrichshafen, Germany |
Notes | Presented by A. M. Bruaset |
Proceedings, refereed
Developing Parallel Object-Oriented Simulation Codes in Diffpack
In Proceedings of the Fifth World Congress on Computational Mechanics. Vienna University of Technology, 2002.Status: Published
Developing Parallel Object-Oriented Simulation Codes in Diffpack
Afilliation | Scientific Computing, Scientific Computing |
Publication Type | Proceedings, refereed |
Year of Publication | 2002 |
Conference Name | Proceedings of the Fifth World Congress on Computational Mechanics |
Place Published | Vienna University of Technology |
Notes | ISBN 3-9501554-0-6 |
Enabling Numerical and Software Technologies for Studying the Electrical Activity in Human Heart
In Applied Parallel Computing - Advanced Scientific Computing, 6th International Conference, PARA 2002. Lecture Notes in Computer Science. Espoo, Finland: Springer-Verlag, 2002.Status: Published
Enabling Numerical and Software Technologies for Studying the Electrical Activity in Human Heart
Afilliation | Scientific Computing, Scientific Computing |
Publication Type | Proceedings, refereed |
Year of Publication | 2002 |
Conference Name | Applied Parallel Computing - Advanced Scientific Computing, 6th International Conference, PARA 2002 |
Pagination | 3-17 |
Publisher | Springer-Verlag |
Place Published | Espoo, Finland |
Parallel Iterative Methods in Modern Physical Applications
In Computational Science - ICCS 2002. Lecture Notes in Computer Science. Springer-Verlag, 2002.Status: Published
Parallel Iterative Methods in Modern Physical Applications
Publication Type | Proceedings, refereed |
Year of Publication | 2002 |
Conference Name | Computational Science - ICCS 2002 |
Pagination | 345-355 |
Publisher | Springer-Verlag |
Technical reports
A Parallel Solution of the Bidomain Equations Modeling the Electrical Activity of the Heart
Simula Research Laboratory, 2001.Status: Published
A Parallel Solution of the Bidomain Equations Modeling the Electrical Activity of the Heart
Afilliation | Scientific Computing, Scientific Computing |
Publication Type | Technical reports |
Year of Publication | 2001 |
Publisher | Simula Research Laboratory |
Proceedings, refereed
A Software Framework for Easy Parallelization of PDE Solvers
In Proceedings of Parallel Computational Fluid Dynamics 2000. North Holland, 2001.Status: Published
A Software Framework for Easy Parallelization of PDE Solvers
Afilliation | Scientific Computing, Scientific Computing |
Publication Type | Proceedings, refereed |
Year of Publication | 2001 |
Conference Name | Proceedings of Parallel Computational Fluid Dynamics 2000 |
Publisher | North Holland |
How Modern Programming Techniques Can Greatly Simplify the Development of Parallel Simulation Codes in Computational Mechanics
In Proceedings of the MekIT'01 Conference. Tapir, 2001.Status: Published
How Modern Programming Techniques Can Greatly Simplify the Development of Parallel Simulation Codes in Computational Mechanics
Afilliation | Scientific Computing, Scientific Computing |
Publication Type | Proceedings, refereed |
Year of Publication | 2001 |
Conference Name | Proceedings of the MekIT'01 Conference |
Publisher | Tapir |
On the Performance of PC Clusters in Solving Partial Differential Equations
In Proceedings of the Tenth SIAM Conference on Parallel Processing for Scientific Computing, 2001.Status: Published
On the Performance of PC Clusters in Solving Partial Differential Equations
Publication Type | Proceedings, refereed |
Year of Publication | 2001 |
Conference Name | Proceedings of the Tenth SIAM Conference on Parallel Processing for Scientific Computing |
Partition of Unstructured Finite Element Meshes by a Multilevel Approach
In Applied Parallel Computing - New Paradigms for HPC in Industry and Academia, 5th International Conference, PARA 2000. Lecture Notes in Computer Science. Bergen, Norway: Springer-Verlag, 2001.Status: Published
Partition of Unstructured Finite Element Meshes by a Multilevel Approach
Publication Type | Proceedings, refereed |
Year of Publication | 2001 |
Conference Name | Applied Parallel Computing - New Paradigms for HPC in Industry and Academia, 5th International Conference, PARA 2000 |
Pagination | 187-195 |
Publisher | Springer-Verlag |
Place Published | Bergen, Norway |
Talks, contributed
How Modern Programming Techniques Can Greatly Simplify the Development of Parallel Simulation Codes in Computational Mechanics
In Talk at the National Conference on Computational Mechanics (MekIT'01), Trondheim, Norway, 2001.Status: Published
How Modern Programming Techniques Can Greatly Simplify the Development of Parallel Simulation Codes in Computational Mechanics
Afilliation | Scientific Computing, Scientific Computing |
Publication Type | Talks, contributed |
Year of Publication | 2001 |
Location of Talk | Talk at the National Conference on Computational Mechanics (MekIT'01), Trondheim, Norway |
Notes | Presented by X. Cai |
Talks, contributed
A Software Framework for Easy Parallelization of PDE Solvers
In Keynote lecture at the Parallel CFD 2000 Conference, Trondheim, Norway, 2000.Status: Published
A Software Framework for Easy Parallelization of PDE Solvers
Publication Type | Talks, contributed |
Year of Publication | 2000 |
Location of Talk | Keynote lecture at the Parallel CFD 2000 Conference, Trondheim, Norway |
Notes | Presented by H. P. Langtangen |
A Software Strategy for Easy Parallelization of Sequential PDE Solvers
In Talk at the minisymposium on Modern Software Aspects for PDE Solvers (organized by H. P. Langtangen and Stefan Turek (University of Dortmund)) at the IMACS 2000 Conference, Lausanne, Switzerland, 2000.Status: Published
A Software Strategy for Easy Parallelization of Sequential PDE Solvers
Publication Type | Talks, contributed |
Year of Publication | 2000 |
Location of Talk | Talk at the minisymposium on Modern Software Aspects for PDE Solvers (organized by H. P. Langtangen and Stefan Turek (University of Dortmund)) at the IMACS 2000 Conference, Lausanne, Switzerland |
Notes | Presented by H. P. Langtangen |
Proceedings, refereed
An Object-Oriented Software Framework for Building Parallel Navier-Stokes Solvers
In Parallel Computational Fluid Dynamics. Elsevier, 2000.Status: Published
An Object-Oriented Software Framework for Building Parallel Navier-Stokes Solvers
Publication Type | Proceedings, refereed |
Year of Publication | 2000 |
Conference Name | Parallel Computational Fluid Dynamics |
Pagination | 147-154 |
Publisher | Elsevier |
Parallel Simulation of 3D Nonlinear Acoustic Fields on a Linux-Cluster
In Proceedings of 2nd IEEE International Conference on Cluster Computing, Germany. IEEE, 2000.Status: Published
Parallel Simulation of 3D Nonlinear Acoustic Fields on a Linux-Cluster
Publication Type | Proceedings, refereed |
Year of Publication | 2000 |
Conference Name | Proceedings of 2nd IEEE International Conference on Cluster Computing, Germany |
Pagination | 185-192 |
Publisher | IEEE |
Journal Article
Parallel Multilevel Methods With Adaptivity on Unstructured Grids
Computing and Visualization in Science 3 (2000): 133-146.Status: Published
Parallel Multilevel Methods With Adaptivity on Unstructured Grids
Publication Type | Journal Article |
Year of Publication | 2000 |
Journal | Computing and Visualization in Science |
Volume | 3 |
Number | 3 |
Pagination | 133-146 |
Journal Article
An Analysis of a Preconditioner for the Discretized Pressure Equation Arising in Reservoir Simulation
IMA Journal of Numerical Analysis 19 (1999): 291-316.Status: Published
An Analysis of a Preconditioner for the Discretized Pressure Equation Arising in Reservoir Simulation
Publication Type | Journal Article |
Year of Publication | 1999 |
Journal | IMA Journal of Numerical Analysis |
Volume | 19 |
Number | 2 |
Pagination | 291-316 |
Talks, contributed
An Object-Oriented Software Framework for Building Parallel Navier-Stokes Solvers
In Talk at Parallel CFD'99, Williamsburg, Virgina, USA, 1999.Status: Published
An Object-Oriented Software Framework for Building Parallel Navier-Stokes Solvers
Publication Type | Talks, contributed |
Year of Publication | 1999 |
Location of Talk | Talk at Parallel CFD'99, Williamsburg, Virgina, USA |
Notes | Presented by X. Cai |
Proceedings, refereed
Two Object-Orientated Approaches to the Parallelization of Diffpack
In Proceedings of the HiPer'99 Conference, 1999.Status: Published
Two Object-Orientated Approaches to the Parallelization of Diffpack
Publication Type | Proceedings, refereed |
Year of Publication | 1999 |
Conference Name | Proceedings of the HiPer'99 Conference |
Journal Article
A Finite Element Method for Fully Nonlinear Water Waves
J. Comput. Phys. 143 (1998): 544-568.Status: Published
A Finite Element Method for Fully Nonlinear Water Waves
Publication Type | Journal Article |
Year of Publication | 1998 |
Journal | J. Comput. Phys. |
Volume | 143 |
Number | 2 |
Pagination | 544-568 |
Application of Cauchy Integrals and Singular Integral Equations in Scattered Data Problems
BIT 38 (1998): 242-255.Status: Published
Application of Cauchy Integrals and Singular Integral Equations in Scattered Data Problems
Publication Type | Journal Article |
Year of Publication | 1998 |
Journal | BIT |
Volume | 38 |
Pagination | 242-255 |
Proceedings, refereed
Domain Decomposition in High-Level Parallelization of PDE Codes
In Proceedings of the 11th international conference on Domain Decomposition Methods, 1998.Status: Published
Domain Decomposition in High-Level Parallelization of PDE Codes
Publication Type | Proceedings, refereed |
Year of Publication | 1998 |
Conference Name | Proceedings of the 11th international conference on Domain Decomposition Methods |
Numerical Simulation of 3D Fully Nonlinear Water Waves on Parallel Computers
In Applied Parallel Computing - Large Scale Scientific and Industrial Problems, 4th International Conference, PARA'98. Lecture Notes in Computer Science. Umeå, Sweden: Springer-Verlag, 1998.Status: Published
Numerical Simulation of 3D Fully Nonlinear Water Waves on Parallel Computers
Publication Type | Proceedings, refereed |
Year of Publication | 1998 |
Conference Name | Applied Parallel Computing - Large Scale Scientific and Industrial Problems, 4th International Conference, PARA'98 |
Pagination | 48-55 |
Publisher | Springer-Verlag |
Place Published | Umeå, Sweden |
PhD Thesis
Numerical Methods for Partial Differential Equations and Their Object-Oriented Parallel Implementations
Department of Informatics, University of Oslo, 1998.Status: Published
Numerical Methods for Partial Differential Equations and Their Object-Oriented Parallel Implementations
Publication Type | PhD Thesis |
Year of Publication | 1998 |
Publisher | Department of Informatics, University of Oslo |
Thesis Type | phd |
Technical reports
Performance Modeling of PDE Solvers
Department of Informatics, University of Oslo, 1998.Status: Published
Performance Modeling of PDE Solvers
Publication Type | Technical reports |
Year of Publication | 1998 |
Number | 1998-3 |
Publisher | Department of Informatics, University of Oslo |
Talks, contributed
Animation of Wave Forces on Offshore Installations in IRIS Explorer
In Render Issue 8 - The newsletter for IRIS Explorer users, 1997.Status: Published
Animation of Wave Forces on Offshore Installations in IRIS Explorer
Publication Type | Talks, contributed |
Year of Publication | 1997 |
Location of Talk | Render Issue 8 - The newsletter for IRIS Explorer users |
Design Issues and Recent Developments in Diffpack
In Invited minisymposium talk at the SIAM Annual meeting, Stanford University, California, USA, 1997.Status: Published
Design Issues and Recent Developments in Diffpack
Publication Type | Talks, contributed |
Year of Publication | 1997 |
Location of Talk | Invited minisymposium talk at the SIAM Annual meeting, Stanford University, California, USA |
Notes | Presented by A. M. Bruaset |
Diffpack: an Object-Oriented Software Environment for Scientific Computing
In Invited minisymposium talk at the Fourth US National Congress on Computational Mechanics, San Fransisco, 1997.Status: Published
Diffpack: an Object-Oriented Software Environment for Scientific Computing
Publication Type | Talks, contributed |
Year of Publication | 1997 |
Location of Talk | Invited minisymposium talk at the Fourth US National Congress on Computational Mechanics, San Fransisco |
Notes | Presented by X. Cai |
Numerical Solution of PDEs on Parallel Computers Utilizing Sequential Simulators
In Talk at the ISCOPE Conference 1997, California, 1997.Status: Published
Numerical Solution of PDEs on Parallel Computers Utilizing Sequential Simulators
Publication Type | Talks, contributed |
Year of Publication | 1997 |
Location of Talk | Talk at the ISCOPE Conference 1997, California |
Notes | Presented by X. Cai |
Proceedings, refereed
Numerical Solution of PDEs on Parallel Computers Utilizing Sequential Simulators
In Scientific Computing in Object-Oriented Parallel Environments. Lecture Notes in Computer Science. Springer-Verlag, 1997.Status: Published
Numerical Solution of PDEs on Parallel Computers Utilizing Sequential Simulators
Publication Type | Proceedings, refereed |
Year of Publication | 1997 |
Conference Name | Scientific Computing in Object-Oriented Parallel Environments |
Pagination | 161-168 |
Publisher | Springer-Verlag |
Book Chapter
Two Fragments of a Method for Fully Nonlinear Simulations of Water Waves
In Waves and Nonlinear Processesin Hydrodynamics, 37-50,. Kluwer Academic Publishers, 1996.Status: Published
Two Fragments of a Method for Fully Nonlinear Simulations of Water Waves
Publication Type | Book Chapter |
Year of Publication | 1996 |
Book Title | Waves and Nonlinear Processesin Hydrodynamics |
Pagination | 37-50, |
Publisher | Kluwer Academic Publishers |
Talks, contributed
A Preconditioner for the Pressure Equation in Reservoir Simulation
In Presented at Institut für Mathematik, Johannes Kepler Universität in Linz, Austria, 1995.Status: Published
A Preconditioner for the Pressure Equation in Reservoir Simulation
Publication Type | Talks, contributed |
Year of Publication | 1995 |
Location of Talk | Presented at Institut für Mathematik, Johannes Kepler Universität in Linz, Austria |
A Preconditioner for the Pressure Equation in Reservoir Simulation
In Presented at Institut für Mathematik, Johannes Kepler Universität in Linz, Austria, 1995.Status: Submitted
A Preconditioner for the Pressure Equation in Reservoir Simulation
Publication Type | Talks, contributed |
Year of Publication | 1995 |
Location of Talk | Presented at Institut für Mathematik, Johannes Kepler Universität in Linz, Austria |
Technical reports
A B-Spline Package in C++
SINTEF, 1994.Status: Published
A B-Spline Package in C++
Publication Type | Technical reports |
Year of Publication | 1994 |
Number | STF33 A94048 |
Publisher | SINTEF |