AuthorsL. Li, H. Hoefsloot, A. A. de Graaf, E. A. Ataman, and A. K. Smilde
TitleExploring Dynamic Metabolomics Data With Multiway Data Analysis: a Simulation Study
AfilliationMachine Learning
Project(s)Department of Data Science and Knowledge Discovery , TrACEr: Time-Aware ConstrainEd Multimodal Data Fusion
StatusPublished
Publication TypeJournal Article
Year of Publication2022
JournalBMC Bioinformatics
Volume23
NumberArticle 31
Date Published2022
PublisherSpringer
Abstract

Background: Analysis of dynamic metabolomics data holds the promise to improve our understanding of underlying mechanisms in metabolism. For example, it may detect changes in metabolism due to the onset of a disease. Dynamic or time-resolved metabolomics data can be arranged as a three-way array with entries organized according to a subjects mode, a metabolites mode and a time mode. While such time-evolving multiway data sets are increasingly collected, revealing the underlying mechanisms and their dynamics from such data remains challenging. For such data, one of the complexities is the presence of a superposition of several sources of variation: induced variation (due to experimental conditions or inborn errors), individual variation, and measurement error. Multiway data analysis (also known as tensor factorizations) has been successfully used in data mining to find the underlying patterns in multiway data. In this paper, we study the use of multiway data analysis to reveal the underlying patterns and dynamics in time-resolved metabolomics data.

Results: We focus on simulated data arising from different dynamic models of increasing complexity, i.e., a simple linear system, a yeast glycolysis model, and a human cholesterol model. We generate data with induced variation as well as individual variation. Systematic experiments are performed to demonstrate the advantages and limitations of multiway data analysis in analyzing such dynamic metabolomics data and their capacity to disentangle the different sources of variations. We choose to use simulations since we want to understand the capability of multiway data analysis methods which is facilitated by knowing the ground truth.

Conclusion: Our numerical experiments demonstrate that despite the increasing complexity of the studied dynamic metabolic models, tensor factorization methods CANDECOMP/PARAFAC(CP) and Parallel Profiles with Linear Dependences (Paralind) can disentangle the sources of variations and thereby reveal the underlying mechanisms and their dynamics.

DOI10.1186/s12859-021-04550-5
Citation Key28151

Contact person