AuthorsA. M. Bruaset, G. T. Lines, and J. Sundnes
EditorsA. Elmokashfi, O. Lysne, and V. Naumova
TitleData aggregation and anonymization for mathematical modeling and epidemiological studies
AfilliationScientific Computing
Project(s)Department of Computational Physiology
Publication TypeBook Chapter
Year of Publication2022
Book TitleSmittestopp - A Case Study on Digital Contact Tracing
PublisherSpringer International Publishing
Place PublishedCham
ISBN Number978-3-031-05466-2

An important secondary purpose of the Smittestopp development was to provide aggregated data sets describing mobility and social interactions in Norway's population. The data were to be used to monitor the effect of government regulations and recommendations, provide input to advanced computational models to predict the pandemic's spread, and provide input to fundamental epidemiology research. In this chapter we describe the challenges and technical solutions of Smittestopp's data aggregation, as well as preliminary results from the time period when the app was active.We first give a detailed overview of the requirements, specifying the types of data to be collected and the level of spatial and temporal aggregation. We then proceed to describe the concepts for anonymization via :-anonymity and Y-differential privacy (Y-DP ), and the technical solutions for collecting and aggregating data from the database. In particular, we present details of how GPS- and Bluetooth events were mapped to geographical regions and points of interest, and the solutions employed for efficient data retrieval and processing. The preliminary results demonstrate how the recorded GPS- and Bluetooth events match with expected temporal and spatial variations in mobility and social interactions, and indicate the usefulness of the aggregated data as a tool for pandemic monitoring and research. One of the main criticisms of Smittestopp concerns the centralized storage of individuals' movements, even if such data were used and presented only at an aggregated and anonymized level. In this chapter, we also outline a completely different approach, where the GPS data do not leave the user's phone but are, instead, pre-processed to a much higher level of privacy before being dispatched to a server-side data aggregation algorithm. This approach, which would make the app significantly less intrusive, is made possible by recent advances in determining close contacts from Bluetooth data, either by a revised Smittestopp algorithm or by means of the Google/Apple Exposure Notification framework.