Projects
Unified PCIe IO: Unified PCI Express for Distributed Component Virtualization

PCI Express (PCIe) is today widely used for static local input/output (I/O) expansion and is gaining momentum as a host-to-host high speed interconnect. However these services are currently not able to integrate onto a single hardware infrastructure, two separate PCIe networks are needed.
Dolphin has developed interconnect solutions since the early 1990s and its current product line relies exclusively on PCIe-based interface cards and switches and competes with alternative high-speed communication technologies. A great advantage of Dolphin’s PCIe products is the reduced protocol overhead compared to technologies like 10Gb Ethernet, InfiniBand and other proprietary interconnect technologies. Dolphin is the market leader in providing fast, optimized and easy to use software and hardware products enabling PCIe to be used as a high speed interconnect.
I/O devices are typically statically assigned to a single root complex (host), hot-add, device migration, device sharing and remote access are not supported in flexible way.
In principle, all I/O devices on a shared PCIe fabric can be accessed directly using the PCIe Non Transparent Bridging (NTB) addressing techniques by any connected remote computer, but the required software structure, OS interfaces and implementations do not exist.
Device sharing functionality is addressed in the PCIe Single Root I/O Virtualization (SR-IOV) and Multi Root I/O Virtualization (MR-IOV) specifications but this is still a static approach and requires PCIe devices to be developed according to these specifications.
As I/O devices are normally owned by one compute node, I/O data are normally relayed to remote servers though the host using traditional networking services. Using PCIe NTB techniques, it is possible to enable I/O devices to directly transfer data to a remote node. This will significantly reduce latency and overhead during data transfers.
Final goal:
The goal with this project is to develop a new framework for the operating system and virtual machines that will enable remote discover, addressing, access and use of standard PCIe devices. The framework will enable standard PCIe devices to be re-allocated and shared by computer nodes in the PCIe network with no or minimum changes to the applications and device drivers. On top of this framework we will develop services to validate and demonstrate the framework such as a fast cluster file system, legacy device driver access to I/O devices on a remote node, and clustering of accelerator cards such as the Intel Xeon Phi co-processor and Nvidia Tesla graphics processing units.\
Funding source:
The Research Council of Norway
All partners:
Dolphin Interconnect Solutions
Publications for Unified PCIe IO: Unified PCI Express for Distributed Component Virtualization
PhD Thesis
SmartIO: Device sharing and memory disaggregation in PCIe clusters using non-transparent bridging
In The University of Oslo. Vol. PhD. University of Oslo (UiO), 2022.Status: Published
SmartIO: Device sharing and memory disaggregation in PCIe clusters using non-transparent bridging
Distributed and parallel computing applications are becoming increasingly compute-heavy and data-driven, accelerating the need for disaggregation solutions that enable sharing of I/O resources between networked machines. For example, in a heterogeneous computing cluster, different machines may have different devices available to them, but distributing I/O resources in a way that maximizes both resource utilization and overall cluster performance is a challenge. To facilitate device sharing and memory disaggregation among machines connected using PCIe non-transparent bridges, we present SmartIO. SmartIO makes all machines in the cluster, including their internal devices and memory, part of a common PCIe domain. By leveraging the memory mapping capabilities of non-transparent bridges, remote resources may be used directly, as if these resources were local to the machines using them. Whether devices are local or remote is made transparent by SmartIO. NVMes, GPUs, FPGAs, NICs, and any other PCIe device can be dynamically shared with and distributed to remote machines, and it is even possible to disaggregate devices and memory, in order to share component parts with multiple machines at the same time. Software is entirely removed from the performance-critical path, allowing remote resources to be used with native PCIe performance. To demonstrate that SmartIO is an efficient solution, we have performed a comprehensive evaluation consisting of a wide range of performance experiments, including both synthetic benchmarks and realistic, large-scale workloads. Our experimental results show that remote resources can be used without any performance overhead compared to using local resources, in terms of throughput and latency. Thus, compared to existing disaggregation solutions, SmartIO provides more efficient, low-cost resource sharing, increasing the overall system performance and resource utilization.
Afilliation | Communication Systems |
Project(s) | Unified PCIe IO: Unified PCI Express for Distributed Component Virtualization, Department of Holistic Systems, Department of High Performance Computing |
Publication Type | PhD Thesis |
Year of Publication | 2022 |
Degree awarding institution | The University of Oslo |
Degree | PhD |
Number of Pages | 236 |
Date Published | 10/2022 |
Publisher | University of Oslo (UiO) |
Thesis Type | Paper Collection |
URL | https://www.duo.uio.no/handle/10852/97351 |
Proceedings, refereed
Host Bypassing: Let your GPU speak Ethernet
In IEEE 8th International Conference on Network Softwarization (NetSoft). IEEE, 2022.Status: Published
Host Bypassing: Let your GPU speak Ethernet
Hardware acceleration of network functions is essential to meet the challenging Quality of Service requirements in nowadays computer networks. Graphical Processing Units (GPU) are a widely deployed technology that can also be used for computing tasks, including acceleration of network functions. In this work, we demonstrate how commodity GPUs, which do not provide any network interfaces, can be used to accelerate network functions. Our approach leverages PCIe peer-to-peer capabilities and allows the GPU to control the network interface card directly, without any assistance from the operating system or control application. The presented evaluation results demonstrate the feasibility of our approach and its performance of up to 10 Gbit/s, even for small packets.
Afilliation | Communication Systems |
Project(s) | Unified PCIe IO: Unified PCI Express for Distributed Component Virtualization, Department of Holistic Systems, Department of High Performance Computing |
Publication Type | Proceedings, refereed |
Year of Publication | 2022 |
Conference Name | IEEE 8th International Conference on Network Softwarization (NetSoft) |
Pagination | 85-90 |
Date Published | 06/2022 |
Publisher | IEEE |
ISBN Number | 978-1-6654-0694-9 |
URL | https://ieeexplore.ieee.org/document/9844090 |
DOI | 10.1109/NetSoft54395.2022.9844090 |
Journal Article
SmartIO: Zero-overhead Device Sharing through PCIe Networking
ACM Transactions on Computer Systems 38, no. 1-2 (2021): 1-78.Status: Published
SmartIO: Zero-overhead Device Sharing through PCIe Networking
The large variety of compute-heavy and data-driven applications accelerate the need for a distributed I/O solution that enables cost-effective scaling of resources between networked hosts. For example, in a cluster system, different machines may have various devices available at different times, but moving workloads to remote units over the network is often costly and introduces large overheads compared to accessing local resources. To facilitate I/O disaggregation and device sharing among hosts connected using Peripheral Component Interconnect Express (PCIe) non-transparent bridges, we present SmartIO. NVMes, GPUs, network adapters, or any other standard PCIe device may be borrowed and accessed directly, as if they were local to the remote machines. We provide capabilities beyond existing disaggregation solutions by combining traditional I/O with distributed shared-memory functionality, allowing devices to become part of the same global address space as cluster applications. Software is entirely removed from the data path, and simultaneous sharing of a device among application processes running on remote hosts is enabled. Our experimental results show that I/O devices can be shared with remote hosts, achieving native PCIe performance. Thus, compared to existing device distribution mechanisms, SmartIO provides more efficient, low-cost resource sharing, increasing the overall system performance
Afilliation | Communication Systems, Machine Learning |
Project(s) | Unified PCIe IO: Unified PCI Express for Distributed Component Virtualization, Department of Holistic Systems, Department of High Performance Computing |
Publication Type | Journal Article |
Year of Publication | 2021 |
Journal | ACM Transactions on Computer Systems |
Volume | 38 |
Issue | 1-2 |
Number | 2 |
Pagination | 1–78 |
Date Published | 07/2021 |
Publisher | Association for Computing Machinery |
Place Published | New York, NY, United States |
ISSN | 0734-2071 |
URL | https://dl.acm.org/doi/10.1145/3462545 |
DOI | 10.1145/3462545 |
Proceedings, refereed
Host Bypassing: Direct Data Piping from the Network to the Hardware Accelerator
In IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC). IEEE, 2021.Status: Published
Host Bypassing: Direct Data Piping from the Network to the Hardware Accelerator
Computer networks have become very important and influential over the last years for many common services such as Internet connectivity as well as time-sensitive applications such as videotelephony. Furthermore, approaches like in-network computing enable the offloading of latency-critical and high-performance network functions into the network, e.g. 5G network functions, to enable such time-sensitive applications. In this work, we show how FPGAs in PCIe-based systems, which are typically used as hardware accelerators for latency-critical in-network functions, can be integrated into the data path. Our approach, named host bypassing, allows direct data transfer from the network interface to the accelerator and accomplishes substantial performance benefits over existing state-of-the-art approaches. Our detailed evaluation results demonstrate the possibility of achieving deterministic low latency while operating under heavy load without any packet loss. In addition, fewer CPU resources are required.
Afilliation | Communication Systems |
Project(s) | Unified PCIe IO: Unified PCI Express for Distributed Component Virtualization, Department of Holistic Systems, Department of High Performance Computing |
Publication Type | Proceedings, refereed |
Year of Publication | 2021 |
Conference Name | IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC) |
Pagination | 23-30 |
Date Published | 12/2021 |
Publisher | IEEE |
ISBN Number | 978-1-6654-3860-5 |
URL | https://ieeexplore.ieee.org/document/9691977 |
DOI | 10.1109/MCSoC51149.2021.00012 |
Journal Article
Flexible device compositions and dynamic resource sharing in PCIe interconnected clusters using Device Lending
Cluster Computing 22, no. 86 (2019): 1-24.Status: Published
Flexible device compositions and dynamic resource sharing in PCIe interconnected clusters using Device Lending
Modern workloads often exceed the processing and I/O capabilities provided by resource virtualization, requiring direct access to the physical hardware in order to reduce latency and computing overhead. For computers interconnected in a cluster, access to remote hardware resources often requires facilitation both in hardware and specialized drivers with virtualization support. This limits the availability of resources to specific devices and drivers that are supported by the virtualization technology being used, as well as what the interconnection technology supports. For PCI Express (PCIe) clusters, we have previously proposed Device Lending as a solution for enabling direct low latency access to remote devices. The method has extremely low computing overhead and does not require any application- or device-specific distribution mechanisms. Any PCIe device, such as network cards disks, and GPUs, can easily be shared among the connected hosts. In this work, we have extended our solution with support for a virtual machine (VM) hypervisor. Physical remote devices can be “passed through” to VM guests, enabling direct access to physical resources while still retaining the flexibility of virtualization. Additionally, we have also implemented multi-device support, enabling shortest-path peer-to-peer transfers between remote devices residing in different hosts. Our experimental results prove that multiple remote devices can be used, achieving bandwidth and latency close to native PCIe, and without requiring any additional support in device drivers. I/O intensive workloads run seamlessly using both local and remote resources. With our added VM and multi-device support, Device Lending offers highly customizable configurations of remote devices that can be dynamically reassigned and shared to optimize resource utilization, thus enabling a flexible composable I/O infrastructure for VMs as well as bare-metal machines.
Afilliation | Communication Systems, Machine Learning |
Project(s) | Unified PCIe IO: Unified PCI Express for Distributed Component Virtualization, LADIO: Live Action Data Input/Output, Department of Holistic Systems, Department of High Performance Computing |
Publication Type | Journal Article |
Year of Publication | 2019 |
Journal | Cluster Computing |
Volume | 22 |
Issue | 86 |
Pagination | 1-24 |
Date Published | 09/2019 |
Publisher | Springer |
ISSN | 1573-7543 |
URL | https://link.springer.com/article/10.1007/s10586-019-02988-0 |
DOI | 10.1007/s10586-019-02988-0 |
Talks, invited
Dynamic Sharing of GPUs and IO in a PCIe Network
In GPU Technology Conference, San Jose, CA, USA. Nvidia, 2019.Status: Published
Dynamic Sharing of GPUs and IO in a PCIe Network
Learn how GPUs, NVMe drives and other IO devices can be efficiently shared in a PCI Express network using SmartIO from Dolphin Interconnect Solutions.
Traditionally, IO devices are statically assigned to a single root complex (host machine), and features such as hot-add, device migration and remote access are not supported flexibly without complex software frameworks. SmartIO eliminates these restrictions and provides a flexible framework for handling PCIe devices and systems. Devices such as GPUs, NVMe drives and other IO devices can be flexibly accessed from remote systems.
We demonstrate how SmartIO is implemented using standard PCIe and Non-Transparent Bridging, show that our system got near-native performance when moving data borrowed GPUs and NVMe drives. We also show how we can dynamically add more GPUs to scale performance.
Afilliation | Communication Systems |
Project(s) | Unified PCIe IO: Unified PCI Express for Distributed Component Virtualization, Department of Holistic Systems, Department of High Performance Computing |
Publication Type | Talks, invited |
Year of Publication | 2019 |
Location of Talk | GPU Technology Conference, San Jose, CA, USA |
Publisher | Nvidia |
Proceedings, refereed
Flexible Device Sharing in PCIe Clusters using Device Lending
In International Conference on Parallel Processing Companion (ICPP'18 Comp). ACM, 2018.Status: Published
Flexible Device Sharing in PCIe Clusters using Device Lending
Processing workloads may have very high IO demands, exceeding the capabilities provided by resource virtualization and requiring direct access to the physical hardware. For computers that are interconnected in PCI Express (PCIe) networks, we have previously proposed Device Lending as a solution for assigning devices to remote hosts. In this paper, we explain how we have extended our implementation with support for the Linux Kernel-based Virtual Machine (KVM) hypervisor. Using our extended Device Lending, it becomes possible to dynamically “pass through” physical remote devices to VM guests while still retaining the flexibility of virtualization, something that previously required extensive facilitation in both hypervisor and device drivers in the form of paravirtualization.
We have also improved our original implementation with sup- port for interoperability between remote devices. We show that it is possible to use multiple devices residing in different hosts, while still achieving the same bandwidth and latency as native PCIe, and without requiring any additional support in device drivers.
Afilliation | Communication Systems |
Project(s) | Unified PCIe IO: Unified PCI Express for Distributed Component Virtualization, LADIO: Live Action Data Input/Output, Department of Holistic Systems, Department of High Performance Computing |
Publication Type | Proceedings, refereed |
Year of Publication | 2018 |
Conference Name | International Conference on Parallel Processing Companion (ICPP'18 Comp) |
Date Published | 08/2018 |
Publisher | ACM |
ISBN Number | 978-1-4503-6523-9/18/08 |
DOI | 10.1145/3229710.3229759 |
Autonomic Adaptation of Multimedia Content Adhering to Application Mobility
In Distributed Applications and Interoperable Systems (DAIS 2018). Lecture Notes in Computer Science ed. Vol. 10853. Springer, Cham, 2018.Status: Published
Autonomic Adaptation of Multimedia Content Adhering to Application Mobility
Today,manyusersofmultimediaapplicationsaresurrounded by a changing set of multimedia-capable devices. However, users can move their running multimedia applications only to a pre-defined set of devices. Application mobility is the paradigm where users can move their running applications (or parts of) to heterogeneous devices in a seamless manner. In order to continue multimedia processing under the implied context changes in application mobility, applications need to adapt the presentation of multimedia content and their internal configuration. We propose the system DAMPAT that implements an adaptation control loop to adapt multimedia pipelines. Exponential combinatorial growth of possible pipeline configurations is controlled by architectural constraints specified as high-level goals by application developers. Our evaluation shows that the pipeline only needs to be interrupted a few tens of milliseconds to perform the reconfiguration. Thus, production or consumption of multimedia content can continue across heterogeneous devices and user context changes in a seamless manner.
Afilliation | Communication Systems |
Project(s) | Unified PCIe IO: Unified PCI Express for Distributed Component Virtualization, Efficient EONS: Execution of Large Workloads on Elastic Heterogeneous Resources, Department of Holistic Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2018 |
Conference Name | Distributed Applications and Interoperable Systems (DAIS 2018) |
Volume | 10853 |
Edition | Lecture Notes in Computer Science |
Pagination | 153-168 |
Date Published | 06/2018 |
Publisher | Springer, Cham |
ISBN Number | 978-3-319-93766-3 |
DOI | 10.1007/978-3-319-93767-0_11 |
Dynamic Adaptation of Multimedia Presentations for Videoconferencing in Application Mobility
In IEEE International Conference on Multimedia and Expo (ICME). San Diego, CA, USA: IEEE, 2018.Status: Published
Dynamic Adaptation of Multimedia Presentations for Videoconferencing in Application Mobility
Application mobility is the paradigm where users can move their running applications to heterogeneous devices in a seamless manner. This mobility involves dynamic context changes of hardware, network resources, user environment, and user preferences. In order to continue multimedia processing under these context changes, applications need to adapt not only the collection of media streams, i.e., multimedia presentation, but also their internal configuration to work on different hardware. We present the performance analysis to adapt a video-conferencing prototype application in a proposed adaptation control loop to autonomously adapt multimedia pipelines. Results show that the time spent to create an adaptation plan and execute it is in the order of hundreds of milliseconds. The reconfiguration of pipelines, compared to building them from scratch, is approximately 1000 times faster when re-utilizing already instantiated hardware-dependent components. Therefore, we conclude that the adaptation of multimedia pipelines is a feasible approach for multimedia applications that adhere to application mobility.
Afilliation | Communication Systems |
Project(s) | Unified PCIe IO: Unified PCI Express for Distributed Component Virtualization, Efficient EONS: Execution of Large Workloads on Elastic Heterogeneous Resources, Department of Holistic Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2018 |
Conference Name | IEEE International Conference on Multimedia and Expo (ICME) |
Date Published | 07/2018 |
Publisher | IEEE |
Place Published | San Diego, CA, USA |
ISSN Number | 1945-7871 |
DOI | 10.1109/ICME.2018.8486565 |
Talks, contributed
SmartIO: Dynamic Sharing of GPUs and IO in a PCIe Cluster
In GPU Technology Conference, San Jose, CA, USA. Nvidia, 2018.Status: Published
SmartIO: Dynamic Sharing of GPUs and IO in a PCIe Cluster
Learn how GPUs, NVMe drives and other IO devices can be efficiently shared in a PCI Express cluster using SmartIO from Dolphin Interconnect Solutions.Traditionally, IO devices have been statically assigned to a single root complex (host machine), and features such as hot-add, device migration and remote access is not supported in a flexible way without complex software frameworks. Dolphin SmartIO eliminates these restrictions and provide a flexible framework for handling PCIe devices and systems. Devices such as GPUs, NVMe drives and other IO devices can be flexibly accessed from remote systems. We demonstrate how SmartIO is implemented using standard PCIe and Non-Transparent Bridging, show that our system gets near native performance when moving data from local GPUs to remote NVMe drives, and how we can dynamically add more GPUs to scale performance.
Afilliation | Communication Systems |
Project(s) | Unified PCIe IO: Unified PCI Express for Distributed Component Virtualization, Department of Holistic Systems, Department of High Performance Computing |
Publication Type | Talks, contributed |
Year of Publication | 2018 |
Location of Talk | GPU Technology Conference, San Jose, CA, USA |
Publisher | Nvidia |
Publications
Journal Article
Nationwide rollout reveals efficacy of epidemic control through digital contact tracing
Nature Communications 12 (2021).Status: Published
Nationwide rollout reveals efficacy of epidemic control through digital contact tracing
Afilliation | Communication Systems, Scientific Computing, Machine Learning |
Project(s) | The Center for Resilient Networks and Applications, Department of Data Science and Knowledge Discovery , Department of Computational Physiology |
Publication Type | Journal Article |
Year of Publication | 2021 |
Journal | Nature Communications |
Volume | 12 |
Number | 5918 |
Publisher | Springer Nature |
DOI | 10.1038/s41467-021-26144-8 |
SmartIO: Zero-overhead Device Sharing through PCIe Networking
ACM Transactions on Computer Systems 38, no. 1-2 (2021): 1-78.Status: Published
SmartIO: Zero-overhead Device Sharing through PCIe Networking
The large variety of compute-heavy and data-driven applications accelerate the need for a distributed I/O solution that enables cost-effective scaling of resources between networked hosts. For example, in a cluster system, different machines may have various devices available at different times, but moving workloads to remote units over the network is often costly and introduces large overheads compared to accessing local resources. To facilitate I/O disaggregation and device sharing among hosts connected using Peripheral Component Interconnect Express (PCIe) non-transparent bridges, we present SmartIO. NVMes, GPUs, network adapters, or any other standard PCIe device may be borrowed and accessed directly, as if they were local to the remote machines. We provide capabilities beyond existing disaggregation solutions by combining traditional I/O with distributed shared-memory functionality, allowing devices to become part of the same global address space as cluster applications. Software is entirely removed from the data path, and simultaneous sharing of a device among application processes running on remote hosts is enabled. Our experimental results show that I/O devices can be shared with remote hosts, achieving native PCIe performance. Thus, compared to existing device distribution mechanisms, SmartIO provides more efficient, low-cost resource sharing, increasing the overall system performance
Afilliation | Communication Systems, Machine Learning |
Project(s) | Unified PCIe IO: Unified PCI Express for Distributed Component Virtualization, Department of Holistic Systems, Department of High Performance Computing |
Publication Type | Journal Article |
Year of Publication | 2021 |
Journal | ACM Transactions on Computer Systems |
Volume | 38 |
Issue | 1-2 |
Number | 2 |
Pagination | 1–78 |
Date Published | 07/2021 |
Publisher | Association for Computing Machinery |
Place Published | New York, NY, United States |
ISSN | 0734-2071 |
URL | https://dl.acm.org/doi/10.1145/3462545 |
DOI | 10.1145/3462545 |
Using 3D Convolutional Neural Networks for Real-time Detection of Soccer Events
International Journal of Semantic Computing 15, no. 2 (2021): 161-187.Status: Published
Using 3D Convolutional Neural Networks for Real-time Detection of Soccer Events
Developing systems for the automatic detection of events in video is a task which has gained attention in many areas including sports. More specifically, event detection for soccer videos has been studied widely in the literature. However, there are still a number of shortcomings in the state-of-the-art such as high latency, making it challenging to operate at the live edge. In this paper, we present an algorithm to detect events in soccer videos in real time, using 3D convolutional neural networks. We test our algorithm on three different datasets from SoccerNet, the Swedish Allsvenskan, and the Norwegian Eliteserien. Overall, the results show that we can detect events with high recall, low latency, and accurate time estimation. The trade-off is a slightly lower precision compared to the current state-of-the-art, which has higher latency and performs better when a less accurate time estimation can be accepted. In addition to the presented algorithm, we perform an extensive ablation study on how the different parts of the training pipeline affect the final results.
Afilliation | Communication Systems, Machine Learning |
Project(s) | Department of Holistic Systems |
Publication Type | Journal Article |
Year of Publication | 2021 |
Journal | International Journal of Semantic Computing |
Volume | 15 |
Issue | 2 |
Number | 2 |
Pagination | 161 - 187 |
Date Published | Jan-06-2021 |
Publisher | World Scientific |
ISSN | 1793-351X |
Keywords | 3d CNN, classification, Detection, soccer events, spotting |
URL | https://www.worldscientific.com/doi/abs/10.1142/S1793351X2140002X |
DOI | 10.1142/S1793351X2140002X |
Journal Article
HyperKvasir, a comprehensive multi-class image and video dataset for gastrointestinal endoscopy
Scientific Data 7, no. 1 (2020): 1-14.Status: Published
HyperKvasir, a comprehensive multi-class image and video dataset for gastrointestinal endoscopy
Artificial intelligence is currently a hot topic in medicine. However, medical data is often sparse and hard to obtain due to legal restrictions and lack of medical personnel for the cumbersome and tedious process to manually label training data. These constraints make it difficult to develop systems for automatic analysis, like detecting disease or other lesions. In this respect, this article presents HyperKvasir, the largest image and video dataset of the gastrointestinal tract available today. The data is collected during real gastro- and colonoscopy examinations at Bærum Hospital in Norway and partly labeled by experienced gastrointestinal endoscopists. The dataset contains 110,079 images and 374 videos, and represents anatomical landmarks as well as pathological and normal findings. The total number of images and video frames together is around 1 million. Initial experiments demonstrate the potential benefits of artificial intelligence-based computer-assisted diagnosis systems. The HyperKvasir dataset can play a valuable role in developing better algorithms and computer-assisted examination systems not only for gastro- and colonoscopy, but also for other fields in medicine.
Afilliation | Machine Learning |
Project(s) | Department of Holistic Systems |
Publication Type | Journal Article |
Year of Publication | 2020 |
Journal | Scientific Data |
Volume | 7 |
Issue | 1 |
Pagination | 1-14 |
Date Published | 08/2020 |
Publisher | Springer Nature |
Keywords | dataset, GI, Machine learning |
URL | http://www.nature.com/articles/s41597-020-00622-y |
DOI | 10.1038/s41597-020-00622-y |
Proceedings, refereed
PMData: a sports logging dataset
In Proceedings of the 11th ACM Multimedia Systems Conference. ACM, 2020.Status: Published
PMData: a sports logging dataset
Afilliation | Machine Learning |
Project(s) | Department of Holistic Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2020 |
Conference Name | Proceedings of the 11th ACM Multimedia Systems Conference |
Pagination | 231-236 |
Publisher | ACM |
Real-Time Detection of Events in Soccer Videos using 3D Convolutional Neural Networks
In 2020 IEEE International Symposium on Multimedia (ISM). IEEE, 2020.Status: Published
Real-Time Detection of Events in Soccer Videos using 3D Convolutional Neural Networks
In this paper, we present an algorithm for automatically detecting events in soccer videos using 3D convolutional neural networks. The algorithm uses a sliding window approach to scan over a given video to detect events such as goals, yellow/red cards, and player substitutions. We test the method on three different datasets from SoccerNet, the Swedish Allsvenskan, and the Norwegian Eliteserien. Overall, the results show that we can detect events with high recall, low latency, and accurate time estimation. The trade-off is a slightly lower precision compared to the current state-of-the-art, which has higher latency and performs better when a less accurate time estimation can be accepted. In addition to the presented algorithm, we perform an extensive ablation study on how the different parts of the training pipeline affect the final results.
Afilliation | Machine Learning |
Project(s) | Department of Holistic Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2020 |
Conference Name | 2020 IEEE International Symposium on Multimedia (ISM) |
Publisher | IEEE |
DOI | 10.1109/ISM.2020.00030 |
Proceedings, refereed
ACM Multimedia BioMedia 2019 Grand Challenge Overview
In The ACM International Conference on Multimedia (ACM MM). New York, New York, USA: ACM Press, 2019.Status: Published
ACM Multimedia BioMedia 2019 Grand Challenge Overview
The BioMedia 2019 ACM Multimedia Grand Challenge is the first in a series of competitions focusing on the use of multimedia for different medical use-cases. In this year’s challenge, the participants are asked to develop efficient algorithms which automatically detect a variety of findings commonly identified in the gastrointestinal (GI) tract (a part of the human digestive system). The purpose of this task is to develop methods to aid medical doctors performing routine endoscopy inspections of the GI tract. In this paper, we give a detailed description of the four different tasks of this year’s challenge, present the datasets used for training and testing, and discuss how each submission is evaluated both qualitatively and quantitatively.
Afilliation | Machine Learning |
Project(s) | Department of Holistic Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2019 |
Conference Name | The ACM International Conference on Multimedia (ACM MM) |
Pagination | 2563-2567 |
Date Published | 10/2019 |
Publisher | ACM Press |
Place Published | New York, New York, USA |
ISBN Number | 9781450368896 |
URL | http://dl.acm.org/citation.cfm?doid=3343031http://dl.acm.org/citation.cf... |
DOI | 10.1145/334303110.1145/3343031.3356058 |
Automatic Hyperparameter Optimization for Transfer Learning on Medical Image Datasets Using Bayesian Optimization
In 13th International Symposium on Medical Information and Communication Technology (ISMICT). IEEE, 2019.Status: Published
Automatic Hyperparameter Optimization for Transfer Learning on Medical Image Datasets Using Bayesian Optimization
Afilliation | Machine Learning |
Project(s) | No Simula project, Department of Holistic Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2019 |
Conference Name | 13th International Symposium on Medical Information and Communication Technology (ISMICT) |
Pagination | 1-6 |
Publisher | IEEE |
DOI | 10.1109/ISMICT.2019.8743779 |
Performance of Data Enhancements and Training Optimization for Neural Network – A Polyp Detection Case Study
In IEEE CBMS International Symposium on Computer-Based Medical Systems (CBMS). IEEE, 2019.Status: Published
Performance of Data Enhancements and Training Optimization for Neural Network – A Polyp Detection Case Study
Deep learning using neural networks is becoming more and more popular. It is frequently used in areas like video analysis, image retrieval, traffic forecast and speech recognition. In this respect, the learning and training process usually requires a lot of data. However, in many areas, data is scarce which is definitely the case in our medical application scenario, i.e., polyp detection in the gastrointestinal tract. Here, colorectal cancer is on the list of most common cancer types, and often, the cancer arises from benign, adenomatous polyps containing dysplastic cells. Detection and removal of polyps can therefore prevent the development of cancer. Due to high cost, time consumption, patient discomfort and in-accuracy of existing procedures, researchers have started to explore systems for automatic polyp detection to assist and automate current examination procedures. Following the current gained traction for neural networks, and the typical lack of medical data, we explore how data enhancements affect the training and evaluation of the networks in terms of polyp detection accuracy and particularly if it can be used to increase the detection rate. We also experiment with how various training techniques can be used to increase performance. Our experimental results show how data enhancement and training optimization can be used to increase different aspects of the performance, but we also point out mechanisms that have no and even a negative effect.
Afilliation | Communication Systems, Machine Learning |
Project(s) | No Simula project, Department of Holistic Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2019 |
Conference Name | IEEE CBMS International Symposium on Computer-Based Medical Systems (CBMS) |
Publisher | IEEE |
Saga: An Open Source Platform for Training Machine Learning Models and Community-driven Sharing of Techniques
In International Conference on Content-Based Multimedia Indexing (CBMI 2019). IEEE, 2019.Status: Published
Saga: An Open Source Platform for Training Machine Learning Models and Community-driven Sharing of Techniques
Afilliation | Machine Learning |
Project(s) | Department of Holistic Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2019 |
Conference Name | International Conference on Content-Based Multimedia Indexing (CBMI 2019) |
Pagination | 1-4 |
Publisher | IEEE |
DOI | 10.1109/CBMI.2019.8877455 |
Semantic Analysis of Soccer News for Automatic Game Event Classification
In 2019 International Conference on Content-Based Multimedia Indexing (CBMI). IEEE, 2019.Status: Published
Semantic Analysis of Soccer News for Automatic Game Event Classification
We are today overwhelmed with information, of which an important part is news. Sports news, in particular, has become very popular, where soccer makes up a big part of this coverage. For sports fans, it can be a time consuming and tedious to keep up with the news that they really care about. In this paper, we present different machine learning methods applied to soccer news from a Norwegian newspaper and a TV station's news site to summarize the content in a short and digestible manner. We present a system to collect, index, label, analyze, and present the collected news articles based on the content. We perform a thorough comparison between deep learning and traditional machine learning algorithms on text classification. Furthermore, we present a dataset of soccer news which was collected from two different Norwegian news sites and shared online.
Afilliation | Machine Learning |
Project(s) | Simula Metropolitan Center for Digital Engineering, Department of Holistic Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2019 |
Conference Name | 2019 International Conference on Content-Based Multimedia Indexing (CBMI) |
Publisher | IEEE |
Summarizing E-Sports Matches and Tournaments: The Example of Counter-Strike: Global Offensive
In International Workshop on Immersive Mixed and Virtual Environment Systems (MMVE). ACM, 2019.Status: Published
Summarizing E-Sports Matches and Tournaments: The Example of Counter-Strike: Global Offensive
That video and computer games have reached the masses is a well known fact. Furthermore, game streaming and watching other people play video games is another phenomenon that has outgrown its small beginning by far, and game streams, be it live or recorded, are today viewed by millions. E-sports is the result of organized leagues and tournaments in which players can compete in controlled environments and viewers can experience the matches, discuss and criticize, just like in physical sports. However, as traditional sports, e-sports matches may be long and contain less interesting parts, introducing the challenge of producing well directed summaries and highlights. In this paper, we describe our efforts to approach the game streaming and e-sports phenomena from a multimedia research point of view. We focus on the challenge of summarizing matches from specific relevant game, Counter-Strike: Global Offensive (CS:GO). We survey related work, describe the rules and structure of the game and identify the main challenges for summarizing e-sports matches. With this contribution, we aim to foster multimedia research in the area of e-sports and game streaming.
Afilliation | Communication Systems, Machine Learning |
Project(s) | Department of Holistic Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2019 |
Conference Name | International Workshop on Immersive Mixed and Virtual Environment Systems (MMVE) |
Publisher | ACM |
Using 2D and 3D Convolutional Neural Networks to Predict Semen Quality
In MediaEval. CEUR Workshop Proceedings, 2019.Status: Published
Using 2D and 3D Convolutional Neural Networks to Predict Semen Quality
In this paper, we present the approach of team Jmag to solve this year's Medico Multimedia Task as part of the MediaEval 2019 Benchmark. This year, the task focuses on automatically determining quality characteristics of human sperm through the analysis of microscopic videos of human semen and associated patient data. Our approach is based on deep convolutional neural networks (CNNs) of varying sizes and dimensions. Here, we aim to analyze both the spatial and temporal information present in the videos. The results show that the method holds promise for predicting the motility of sperm, but predicting morphology appears to be more difficult.
Afilliation | Machine Learning |
Project(s) | Department of Holistic Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2019 |
Conference Name | MediaEval |
Publisher | CEUR Workshop Proceedings |
Talks, invited
Dynamic Sharing of GPUs and IO in a PCIe Network
In GPU Technology Conference, San Jose, CA, USA. Nvidia, 2019.Status: Published
Dynamic Sharing of GPUs and IO in a PCIe Network
Learn how GPUs, NVMe drives and other IO devices can be efficiently shared in a PCI Express network using SmartIO from Dolphin Interconnect Solutions.
Traditionally, IO devices are statically assigned to a single root complex (host machine), and features such as hot-add, device migration and remote access are not supported flexibly without complex software frameworks. SmartIO eliminates these restrictions and provides a flexible framework for handling PCIe devices and systems. Devices such as GPUs, NVMe drives and other IO devices can be flexibly accessed from remote systems.
We demonstrate how SmartIO is implemented using standard PCIe and Non-Transparent Bridging, show that our system got near-native performance when moving data borrowed GPUs and NVMe drives. We also show how we can dynamically add more GPUs to scale performance.
Afilliation | Communication Systems |
Project(s) | Unified PCIe IO: Unified PCI Express for Distributed Component Virtualization, Department of Holistic Systems, Department of High Performance Computing |
Publication Type | Talks, invited |
Year of Publication | 2019 |
Location of Talk | GPU Technology Conference, San Jose, CA, USA |
Publisher | Nvidia |
Journal Article
Efficient Live and On-Demand Tiled HEVC 360 VR Video Streaming
International Journal of Semantic Computing 13, no. 3 (2019): 367-391.Status: Published
Efficient Live and On-Demand Tiled HEVC 360 VR Video Streaming
360 panorama video displayed through Virtual reality (VR) glasses or large screens o®ers immersive user experiences, but as such technology becomes commonplace, the need for e±cient streaming methods of such high-bitrate videos arises. In this respect, the attention that 360panorama video has received lately is huge. Many methods have already been proposed, and in this paper, we shed more light on the di®erent trade-o®s in order to save bandwidth while preserving the video quality in the user's ̄eld-of-view (FoV). Using 360 VR content delivered to a Gear VR head-mounted display with a Samsung Galaxy S7 and to a Huawei Q22 set-top- box, we have tested various tiling schemes analyzing the tile layout, the tiling and encoding overheads, mechanisms for faster quality switching beyond the DASH segment boundaries and quality selection con ̄gurations. In this paper, we present an e±cient end-to-end design and real-world implementation of such a 360 streaming system. Furthermore, in addition to researching an on-demand system, we also go beyond the existing on-demand solutions and present a live streaming system which strikes a trade-o® between bandwidth usage and the video quality in the user's FoV. We have created an architecture that combines RTP and DASH, and our system multiplexes a single HEVC hardware decoder to provide faster quality switching than at the traditional GOP boundaries. We demonstrate the performance and illustrate the trade-o®s through real-world experiments where we can report comparable bandwidth savings to existing on-demand approaches, but with faster quality switches when the FoV changes.
Afilliation | Communication Systems, Machine Learning |
Project(s) | No Simula project, Department of Holistic Systems |
Publication Type | Journal Article |
Year of Publication | 2019 |
Journal | International Journal of Semantic Computing |
Volume | 13 |
Issue | 3 |
Number | 3 |
Pagination | 367-391 |
Publisher | World Scientific |
Flexible device compositions and dynamic resource sharing in PCIe interconnected clusters using Device Lending
Cluster Computing 22, no. 86 (2019): 1-24.Status: Published
Flexible device compositions and dynamic resource sharing in PCIe interconnected clusters using Device Lending
Modern workloads often exceed the processing and I/O capabilities provided by resource virtualization, requiring direct access to the physical hardware in order to reduce latency and computing overhead. For computers interconnected in a cluster, access to remote hardware resources often requires facilitation both in hardware and specialized drivers with virtualization support. This limits the availability of resources to specific devices and drivers that are supported by the virtualization technology being used, as well as what the interconnection technology supports. For PCI Express (PCIe) clusters, we have previously proposed Device Lending as a solution for enabling direct low latency access to remote devices. The method has extremely low computing overhead and does not require any application- or device-specific distribution mechanisms. Any PCIe device, such as network cards disks, and GPUs, can easily be shared among the connected hosts. In this work, we have extended our solution with support for a virtual machine (VM) hypervisor. Physical remote devices can be “passed through” to VM guests, enabling direct access to physical resources while still retaining the flexibility of virtualization. Additionally, we have also implemented multi-device support, enabling shortest-path peer-to-peer transfers between remote devices residing in different hosts. Our experimental results prove that multiple remote devices can be used, achieving bandwidth and latency close to native PCIe, and without requiring any additional support in device drivers. I/O intensive workloads run seamlessly using both local and remote resources. With our added VM and multi-device support, Device Lending offers highly customizable configurations of remote devices that can be dynamically reassigned and shared to optimize resource utilization, thus enabling a flexible composable I/O infrastructure for VMs as well as bare-metal machines.
Afilliation | Communication Systems, Machine Learning |
Project(s) | Unified PCIe IO: Unified PCI Express for Distributed Component Virtualization, LADIO: Live Action Data Input/Output, Department of Holistic Systems, Department of High Performance Computing |
Publication Type | Journal Article |
Year of Publication | 2019 |
Journal | Cluster Computing |
Volume | 22 |
Issue | 86 |
Pagination | 1-24 |
Date Published | 09/2019 |
Publisher | Springer |
ISSN | 1573-7543 |
URL | https://link.springer.com/article/10.1007/s10586-019-02988-0 |
DOI | 10.1007/s10586-019-02988-0 |
Poster
Efficient Processing of Medical Videos in a Multi-auditory Environment Using Gpu Lending
NVIDIA's GPU Technology Conference (GTC), 2019.Status: Published
Efficient Processing of Medical Videos in a Multi-auditory Environment Using Gpu Lending
Afilliation | Software Engineering |
Project(s) | No Simula project, Department of Holistic Systems |
Publication Type | Poster |
Year of Publication | 2019 |
Place Published | NVIDIA's GPU Technology Conference (GTC) |
Proceedings, refereed
Automatic Hyperparameter Optimization in Keras for the MediaEval 2018 Medico Multimedia Task
In Working Notes Proceedings of the MediaEval 2018 Workshop. CEUR Workshop Proceedings (CEUR-WS.org), 2018.Status: Published
Automatic Hyperparameter Optimization in Keras for the MediaEval 2018 Medico Multimedia Task
This paper details the approach to the MediaEval 2018 Medico Multimedia Task made by the Rune team. The decided upon approach uses a work-in-progress hyperparameter optimization system called Saga. Saga is a system for creating the best hyperparameter finding in Keras, a popular machine learning framework, using Bayesian optimization and transfer learning. In addition to optimizing the Keras classifier configuration, we try manipulating the dataset by adding extra images in a class lacking in images and splitting a commonly misclassified class into two classes.
Afilliation | Machine Learning |
Project(s) | Department of Holistic Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2018 |
Conference Name | Working Notes Proceedings of the MediaEval 2018 Workshop |
Publisher | CEUR Workshop Proceedings (CEUR-WS.org) |
Keywords | automatic hyperparameter optimization, Bayesian optimization, CNN, convolutional neural networks, dataset manipulation, gpyopt, hyperparameter optimization, keras, saga, tensorflow, Transfer Learning |
Autonomic Adaptation of Multimedia Content Adhering to Application Mobility
In Distributed Applications and Interoperable Systems (DAIS 2018). Lecture Notes in Computer Science ed. Vol. 10853. Springer, Cham, 2018.Status: Published
Autonomic Adaptation of Multimedia Content Adhering to Application Mobility
Today,manyusersofmultimediaapplicationsaresurrounded by a changing set of multimedia-capable devices. However, users can move their running multimedia applications only to a pre-defined set of devices. Application mobility is the paradigm where users can move their running applications (or parts of) to heterogeneous devices in a seamless manner. In order to continue multimedia processing under the implied context changes in application mobility, applications need to adapt the presentation of multimedia content and their internal configuration. We propose the system DAMPAT that implements an adaptation control loop to adapt multimedia pipelines. Exponential combinatorial growth of possible pipeline configurations is controlled by architectural constraints specified as high-level goals by application developers. Our evaluation shows that the pipeline only needs to be interrupted a few tens of milliseconds to perform the reconfiguration. Thus, production or consumption of multimedia content can continue across heterogeneous devices and user context changes in a seamless manner.
Afilliation | Communication Systems |
Project(s) | Unified PCIe IO: Unified PCI Express for Distributed Component Virtualization, Efficient EONS: Execution of Large Workloads on Elastic Heterogeneous Resources, Department of Holistic Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2018 |
Conference Name | Distributed Applications and Interoperable Systems (DAIS 2018) |
Volume | 10853 |
Edition | Lecture Notes in Computer Science |
Pagination | 153-168 |
Date Published | 06/2018 |
Publisher | Springer, Cham |
ISBN Number | 978-3-319-93766-3 |
DOI | 10.1007/978-3-319-93767-0_11 |
Dynamic Adaptation of Multimedia Presentations for Videoconferencing in Application Mobility
In IEEE International Conference on Multimedia and Expo (ICME). San Diego, CA, USA: IEEE, 2018.Status: Published
Dynamic Adaptation of Multimedia Presentations for Videoconferencing in Application Mobility
Application mobility is the paradigm where users can move their running applications to heterogeneous devices in a seamless manner. This mobility involves dynamic context changes of hardware, network resources, user environment, and user preferences. In order to continue multimedia processing under these context changes, applications need to adapt not only the collection of media streams, i.e., multimedia presentation, but also their internal configuration to work on different hardware. We present the performance analysis to adapt a video-conferencing prototype application in a proposed adaptation control loop to autonomously adapt multimedia pipelines. Results show that the time spent to create an adaptation plan and execute it is in the order of hundreds of milliseconds. The reconfiguration of pipelines, compared to building them from scratch, is approximately 1000 times faster when re-utilizing already instantiated hardware-dependent components. Therefore, we conclude that the adaptation of multimedia pipelines is a feasible approach for multimedia applications that adhere to application mobility.
Afilliation | Communication Systems |
Project(s) | Unified PCIe IO: Unified PCI Express for Distributed Component Virtualization, Efficient EONS: Execution of Large Workloads on Elastic Heterogeneous Resources, Department of Holistic Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2018 |
Conference Name | IEEE International Conference on Multimedia and Expo (ICME) |
Date Published | 07/2018 |
Publisher | IEEE |
Place Published | San Diego, CA, USA |
ISSN Number | 1945-7871 |
DOI | 10.1109/ICME.2018.8486565 |
Flexible Device Sharing in PCIe Clusters using Device Lending
In International Conference on Parallel Processing Companion (ICPP'18 Comp). ACM, 2018.Status: Published
Flexible Device Sharing in PCIe Clusters using Device Lending
Processing workloads may have very high IO demands, exceeding the capabilities provided by resource virtualization and requiring direct access to the physical hardware. For computers that are interconnected in PCI Express (PCIe) networks, we have previously proposed Device Lending as a solution for assigning devices to remote hosts. In this paper, we explain how we have extended our implementation with support for the Linux Kernel-based Virtual Machine (KVM) hypervisor. Using our extended Device Lending, it becomes possible to dynamically “pass through” physical remote devices to VM guests while still retaining the flexibility of virtualization, something that previously required extensive facilitation in both hypervisor and device drivers in the form of paravirtualization.
We have also improved our original implementation with sup- port for interoperability between remote devices. We show that it is possible to use multiple devices residing in different hosts, while still achieving the same bandwidth and latency as native PCIe, and without requiring any additional support in device drivers.
Afilliation | Communication Systems |
Project(s) | Unified PCIe IO: Unified PCI Express for Distributed Component Virtualization, LADIO: Live Action Data Input/Output, Department of Holistic Systems, Department of High Performance Computing |
Publication Type | Proceedings, refereed |
Year of Publication | 2018 |
Conference Name | International Conference on Parallel Processing Companion (ICPP'18 Comp) |
Date Published | 08/2018 |
Publisher | ACM |
ISBN Number | 978-1-4503-6523-9/18/08 |
DOI | 10.1145/3229710.3229759 |
Tradeoffs using Binary and Multiclass Neural Network Classification for Medical Multidisease Detection
In 2018 IEEE International Symposium on Multimedia (ISM). IEEE, 2018.Status: Published
Tradeoffs using Binary and Multiclass Neural Network Classification for Medical Multidisease Detection
The interest in neural networks has increased sig- nificantly, and the application of this type of machine learning is vast, ranging from natural image classification to medical image segmentation. However, many users of neural networks tend to use them as a black box tool. They do not access all of the possible variations, nor take into account the respective classification accuracies and costs. In our work, we focus on multiclass image classification, and in this research, we shed light on the trade-offs between systems using a single multiclass classification and multiple binary classifiers, respectively. We have tested the these classifiers on several modern neural network architectures, including DenseNet, Inception v3, Inception ResNet v2, Xception, NASNet and MobileNet. We have compared several aspects of the performance of these architectures during training and testing using both classification styles. We have compared classification speed and several classification accuracy metrics. Here, we present the results from experiments on a total of 99 networks: 11 multiclass and 88 individual binary networks, for an 8-class classification of medical images. In short, using multiple binary classification networks resulted in a 7% increase in the average F1 score, a 1% increase in average accuracy, a 1% increase in precision, and a 4% increase in average recall. However, on average, such a multi-network style performed the classification 7.6 times slower compared to a single network multiclass implementation. These collective findings show that both approaches can be applied to modern neural network structures. Several binary networks will often give increased classification accuracy, but at the cost of classification speed and resource consumption.
Afilliation | Communication Systems, Machine Learning |
Project(s) | No Simula project |
Publication Type | Proceedings, refereed |
Year of Publication | 2018 |
Conference Name | 2018 IEEE International Symposium on Multimedia (ISM) |
Pagination | 1-8 |
Date Published | 12/2018 |
Publisher | IEEE |
DOI | 10.1109/ISM.2018.00009 |
Book Chapter
Camera Synchronization for Panoramic Videos
In MediaSync, 565-592. Springer, 2018.Status: Published
Camera Synchronization for Panoramic Videos
Afilliation | Communication Systems |
Project(s) | Efficient EONS: Execution of Large Workloads on Elastic Heterogeneous Resources, Department of Holistic Systems |
Publication Type | Book Chapter |
Year of Publication | 2018 |
Book Title | MediaSync |
Pagination | 565-592 |
Date Published | 03/2018 |
Publisher | Springer |
URL | https://doi.org/10.1007/978-3-319-65840-7_20 |
DOI | 10.1007/978-3-319-65840-7_20 |
Talks, contributed
SmartIO: Dynamic Sharing of GPUs and IO in a PCIe Cluster
In GPU Technology Conference, San Jose, CA, USA. Nvidia, 2018.Status: Published
SmartIO: Dynamic Sharing of GPUs and IO in a PCIe Cluster
Learn how GPUs, NVMe drives and other IO devices can be efficiently shared in a PCI Express cluster using SmartIO from Dolphin Interconnect Solutions.Traditionally, IO devices have been statically assigned to a single root complex (host machine), and features such as hot-add, device migration and remote access is not supported in a flexible way without complex software frameworks. Dolphin SmartIO eliminates these restrictions and provide a flexible framework for handling PCIe devices and systems. Devices such as GPUs, NVMe drives and other IO devices can be flexibly accessed from remote systems. We demonstrate how SmartIO is implemented using standard PCIe and Non-Transparent Bridging, show that our system gets near native performance when moving data from local GPUs to remote NVMe drives, and how we can dynamically add more GPUs to scale performance.
Afilliation | Communication Systems |
Project(s) | Unified PCIe IO: Unified PCI Express for Distributed Component Virtualization, Department of Holistic Systems, Department of High Performance Computing |
Publication Type | Talks, contributed |
Year of Publication | 2018 |
Location of Talk | GPU Technology Conference, San Jose, CA, USA |
Publisher | Nvidia |
Proceedings, refereed
A Holistic Multimedia System for Gastrointestinal Tract Disease Detection
In 8th annual ACM conference on Multimedia Systems (MMSys). ACM, 2017.Status: Published
A Holistic Multimedia System for Gastrointestinal Tract Disease Detection
Analysis of medical videos for detection of abnormalities and diseases requires both high precision and recall, but also real-time processing for live feedback and scalability for massive screening of entire populations. Existing work on this field does not provide the necessary combination of retrieval accuracy and performance.
In this paper, a multimedia system is presented where the aim is to tackle automatic analysis of videos from the human gastrointestinal (GI) tract. The system includes the whole pipeline from data collection, processing and analysis, to visualization. The system combines filters using machine learning, image recognition and extraction of global and local image features. Furthermore, it is built in a modular way so that it can easily be extended. At the same time, it is developed for efficient processing in order to provide real-time feedback to the doctors. Our experimental evaluation proves that our system has detection and localisation accuracy at least as good as existing systems for polyp detection, it is capable of detecting a wider range of diseases, it can analyze video in real-time, and it has a low resource consumption for scalability.
Afilliation | Communication Systems |
Project(s) | Efficient EONS: Execution of Large Workloads on Elastic Heterogeneous Resources, Unified PCIe IO: Unified PCI Express for Distributed Component Virtualization |
Publication Type | Proceedings, refereed |
Year of Publication | 2017 |
Conference Name | 8th annual ACM conference on Multimedia Systems (MMSys) |
Pagination | 112-123 |
Date Published | 06/2017 |
Publisher | ACM |
ISBN Number | 978-1-4503-5002-0 |
URL | http://dl.acm.org/citation.cfm?id=3083189 |
DOI | 10.1145/3083187.3083189 |
Load Balancing of Multimedia Workloads for Energy Efficiency on the Tegra K1 Multicore Architecture
In 8th annual ACM conference on Multimedia Systems (MMSys). ACM, 2017.Status: Published
Load Balancing of Multimedia Workloads for Energy Efficiency on the Tegra K1 Multicore Architecture
Energy efficiency is a timely topic for modern mobile computing. Reducing the energy consumption of devices not only increases their battery lifetime, but also reduces the risk of hardware failure. Many researchers strive to
understand the relationship between software activity and hardware power usage. A recurring strategy for saving power is to reduce operating frequencies. It is widely acknowledged that standard frequency scaling algorithms generally overreact to changes in hardware utilisation. More recent and original efforts attempt to balance software workloads on heterogeneous multicore architectures, such as the Tegra K1, which includes a quad-core CPU and a CUDA-capable GPU. However, it is not known whether it is possible to utilise these processor elements in parallel to save energy. Research into these types of systems are unfortunately often evaluated with the Performance Per Watt (PPW) metric, which is an unaccurate method because it ignores constant power usage from idle components. We show that this metric can end up increase energy usage on the Tegra K1, and give a false impression of how such systems consume energy. In reality, we show that it is much harder to save energy by balancing workloads between the heterogeneous cores of the Tegra K1, where we demonstrate only a 5% energy saving by offloading 10% DCT workload from the GPU to the CPU. Significantly more energy can be saved (up to 50%) using the appropriate processor for different workloads.
Afilliation | Communication Systems |
Project(s) | Unified PCIe IO: Unified PCI Express for Distributed Component Virtualization, Efficient EONS: Execution of Large Workloads on Elastic Heterogeneous Resources |
Publication Type | Proceedings, refereed |
Year of Publication | 2017 |
Conference Name | 8th annual ACM conference on Multimedia Systems (MMSys) |
Pagination | 124-135 |
Date Published | 06/2017 |
Publisher | ACM |
ISBN Number | 978-1-4503-5002-0 |
URL | http://dl.acm.org/citation.cfm?doid=3083187.3083195 |
DOI | 10.1145/3083187.3083195 |
Poster
A High Precision Power Model for the Tegra K1 CPU, GPU and RAM
In GPU Technology Conference 2016. Nvidia, 2016.Status: Published
A High Precision Power Model for the Tegra K1 CPU, GPU and RAM
Power modelling is an important topic in many areas of computing, for example to save energy in texture streaming for gaming[1] or to select efficient H.264 video encoding parameters[2]. However, researchers' view of how hardware consume power is limited. They typically resort to ratebased models to describe the energy consumption of hardware, where power usage is correlated directly with hardware access rates (for example instructions or cache misses per second)[3,4,5,6]. This approach ignores many mechanisms that impact the power usage of a system, such as rail voltages, core and clock gating, frequency scaling and variable cost of instruction execution. Because of this, they can mispredict up to 70 % on the Tegra K1. We show that by taking all these factors into account with sufficient hardware knowledge, it is possible to bridge the gap between power usage and software execution to build power models which are over 98 % accurate over all CPU, GPU and memory frequency combinations.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Poster |
Year of Publication | 2016 |
Secondary Title | GPU Technology Conference 2016 |
Date Published | 04/2016 |
Publisher | Nvidia |
URL | http://on-demand.gputechconf.com/gtc/2016/posters/images/1920x1607/GTC_2... |
Proceedings, refereed
A High-Precision, Hybrid GPU, CPU and RAM Power Model for Generic Multimedia Workloads
In 7th annual ACM conference on Multimedia Systems (MMSys). ACM, 2016.Status: Published
A High-Precision, Hybrid GPU, CPU and RAM Power Model for Generic Multimedia Workloads
Energy efficiency of multimedia processing is a hot topic in modern, mobile computing where the lifetime of battery- powered devices is low. Authors often use power models as tools to evaluate the energy-efficiency of multimedia work- loads and processing schemes. A challenge with these mod- els is that they are built without sufficiently deep hardware knowledge and as a result they have the potential to mis- predict substantially depending on hardware configuration. Typical rate-based power models can for example mispredict up to 70 % on the Tegra K1 SoC. Inspired by multimedia workloads, we introduce a modelling methodology which can be used to build a generic, high-precision power model for the Tegra K1’s GPU and memory. By considering hardware utilisation, rail voltages, leakage currents and clocks, the model achieves an average accuracy above 99 % over all op- erating frequencies, and has been rigorously tested on several multimedia workloads. Our method exposes detailed insight into hardware and how it consumes energy. This knowledge is not only useful for researchers to understand how power models should be built, but also helps to understand what developers can do to minimise power usage. For example, experiments show that for a DCT benchmark, 3 % power can be saved by utilising non-coherent caches and smaller datatypes.
Afilliation | Communication Systems |
Project(s) | No Simula project |
Publication Type | Proceedings, refereed |
Year of Publication | 2016 |
Conference Name | 7th annual ACM conference on Multimedia Systems (MMSys) |
Pagination | 14:1-14:12 |
Date Published | 05/2016 |
Publisher | ACM |
ISBN Number | 978-1-4503-4297-1 |
DOI | 10.1145/2910017.2910591 |
Computer Aided Disease Detection System for Gastrointestinal Examinations
In Multimedia Systems Conference 2016. New York: ACM, 2016.Status: Published
Computer Aided Disease Detection System for Gastrointestinal Examinations
Afilliation | Communication Systems, Communication Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2016 |
Conference Name | Multimedia Systems Conference 2016 |
Date Published | 05/2016 |
Publisher | ACM |
Place Published | New York |
Device Lending in PCI Express Networks
In Proceedings of the 26th International Workshop on Network and Operating Systems Support for Digital Audio and Video (NOSSDAV). ACM, 2016.Status: Published
Device Lending in PCI Express Networks
The challenge of scaling IO performance of multimedia systems to demands of their users has attracted much research. A lot of effort has gone into development of distributed systems that add little latency and computing overhead. For machines in PCI Express (PCIe) clusters, we propose Device Lending as a novel solution which works at a system level. Device Lending achieves low latency and extremely low computing overhead without requiring any application-specific distribution mechanisms. For applications, the remote IO resource appears local. In fact, even the drivers of the operating system remain unaware that hardware resources are located in remote machines. By enabling machines in a PCIe cluster to lend a wide variety of hardware, cluster machines can get temporary access to a pool of IO resources. Network cards, FPGAs, SSDs, and even GPUs can easily be shared among computers. Our proposed solution, Device Lending, works transparently without requiring any modifications to drivers, operating systems or software applications.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2016 |
Conference Name | Proceedings of the 26th International Workshop on Network and Operating Systems Support for Digital Audio and Video (NOSSDAV) |
Pagination | 10:1-10:6 |
Date Published | 05/2016 |
Publisher | ACM |
ISBN Number | 978-1-4503-4356-5 |
DOI | 10.1145/2910642.2910650 |
Efficient Processing of Videos in a Multi Auditory Environment Using Device Lending of GPUs
In The 7th International Conference on Multimedia Systems (MMSys). ACM, 2016.Status: Published
Efficient Processing of Videos in a Multi Auditory Environment Using Device Lending of GPUs
Afilliation | Communication Systems |
Project(s) | No Simula project |
Publication Type | Proceedings, refereed |
Year of Publication | 2016 |
Conference Name | The 7th International Conference on Multimedia Systems (MMSys) |
Pagination | 36:1-36:4 |
Date Published | 05/2016 |
Publisher | ACM |
ISBN Number | 978-1-4503-4297-1 |
DOI | 10.1145/2910017.2910636 |
High-Precision Power Modelling of the Tegra K1 Variable SMP Processor Architecture
In 10th IEEE International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC). IEEE, 2016.Status: Published
High-Precision Power Modelling of the Tegra K1 Variable SMP Processor Architecture
Energy efficiency is an important issue for many embedded systems, where limited battery lifetime and power- hungry hardware constrain the usefulness of such devices. Modern Systems-on-Chip (SoCs) such as the Tegra K1 employ advanced power management capabilities such as two CPU clus- ters, clock-gating, power-gating and dynamic frequency tuning to meet application demands. At design or runtime phases, it is challenging for system architects and software developers to understand the effects that these mechanisms have in terms of power and performance in all parts of the system. This is especially because it is impossible to measure directly the power usage of cores, caches, memory and other hardware components. Rate-based power models are often proposed as a solution for this, but unfortunately these can mispredict substantially on the Tegra K1 up to 30 %. In this paper, we propose a power modelling method for the Tegra K1 CPU which overcomes the limitations of the most common types of models found in literature, but still only requires power measurement of the board. Through extensive empirical validation we demonstrate an accuracy which is close to 100 %. Through preliminary experiments we show that our methodology is able to capture instruction power of individual system processes and applications and produce detailed power breakdowns of all components in the system.
Afilliation | Communication Systems |
Project(s) | No Simula project |
Publication Type | Proceedings, refereed |
Year of Publication | 2016 |
Conference Name | 10th IEEE International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC) |
Pagination | 193-200 |
Date Published | 09/2016 |
Publisher | IEEE |
DOI | 10.1109/MCSoC.2016.28 |
Right inflight? A dataset for exploring the automatic prediction of movies suitable for a watching situation
In Multimedia Systems Conference 2016. New York: ACM, 2016.Status: Published
Right inflight? A dataset for exploring the automatic prediction of movies suitable for a watching situation
Afilliation | Communication Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2016 |
Conference Name | Multimedia Systems Conference 2016 |
Date Published | 05/2016 |
Publisher | ACM |
Place Published | New York |
Journal Article
An Experimental Evaluation of Debayering Algorithms on GPUs for Recording Panoramic Video in Real-time
International Journal of Multimedia Data Engineering and Management (IJMDEM) 6, no. 3 (2015): 1-16.Status: Published
An Experimental Evaluation of Debayering Algorithms on GPUs for Recording Panoramic Video in Real-time
Modern video cameras often only capture a single color per pixel in a single pass operation. This process is called ltering, where pixels are ltered through a color lter array, and the Bayer filter is perhaps the most common filter used today. This means that we must restore the missing color channels in the image or the video frame in a post-processing step, i.e., a process referred to as debayering. In a live video scenario, this operation must be performed eciently in order to output each video frame in real-time, while also yielding acceptable visual quality. Here, we evaluate debayering algorithms implemented on a GPU for real-time panoramic video recordings using multiple 2K-resolution cameras.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Journal Article |
Year of Publication | 2015 |
Journal | International Journal of Multimedia Data Engineering and Management (IJMDEM) |
Volume | 6 |
Issue | 3 |
Pagination | 1-16 |
Date Published | 07/2015 |
Publisher | IGI Global |
ISSN | 1947-8534 |
DOI | 10.4018/ijmdem.2015070101 |
Using a Commodity Hardware Video Encoder for Interactive Applications
International Journal of Multimedia Data Engineering and Management (IJMDEM) 6, no. 3 (2015): 17-31.Status: Published
Using a Commodity Hardware Video Encoder for Interactive Applications
Over the last years, video streaming has become one of the most dominant Internet services. Due to the increased availability of high-speed Internet access, multimedia services are becoming more interactive. Examples of such applications are both cloud gaming and systems where users can interact with high-resolution content. During the last few years, programmable hardware video encoders have been built into commodity hardware such as CPUs and GPUs. We evaluate one of these encoders in a scenario where we have individual streams delivered to the end users. Our results show that the visual video quality and the frame size of the hardware-based encoder are comparable to a software-based approach. To evaluate a complete system, we have implemented our proposed streaming pipeline into Quake III. We found that running the game on a remote server and streaming the video output to a client web browser located in a typical home environment is possible and enjoyable. The interaction latency is measured to be less than 90 ms, which is below what is reported for OnLive in a similar environment
Afilliation | Communication Systems, Communication Systems |
Publication Type | Journal Article |
Year of Publication | 2015 |
Journal | International Journal of Multimedia Data Engineering and Management (IJMDEM) |
Volume | 6 |
Issue | 3 |
Pagination | 17-31 |
Date Published | 07/2015 |
Publisher | IGI Global |
ISSN | 1947-8534 |
DOI | 10.4018/ijmdem.2015070102 |
Poster
Energy and Performance Optimization of a Simple Video Encoder on the Jetson-TK1
In GPU Technology Conference 2015. Nvidia, 2015.Status: Published
Energy and Performance Optimization of a Simple Video Encoder on the Jetson-TK1
This poster analyses the energy consumption of a simple video encoder running on NVIDIA's Tegra K1 processor. The total energy consumption of the video encoder is investigated under the influence of different hardware configurations, such as which processors (CPU clusters or GPU) are used, DVFS algorithms, and whether performance optimisations like NEON are implemented. We find that NEON instructions and multithreading generally have positive effects on energy consumption, saving between 25 to 40 % energy compared to a nonoptimised, naive implementation. GPU offloading is found to be marginally better than CPU execution by an amount of 1.7 %.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Poster |
Year of Publication | 2015 |
Secondary Title | GPU Technology Conference 2015 |
Date Published | 03/2015 |
Publisher | Nvidia |
URL | http://on-demand.gputechconf.com/gtc/2015/posters/GTC_2015_Embedded_Syst... |
Proceedings, refereed
Energy Efficient Continuous Multimedia Processing Using the Tegra K1 Mobile SoC
In Proceedings of the 7th ACM International Workshop on Mobile Video (MoVid). ACM, 2015.Status: Published
Energy Efficient Continuous Multimedia Processing Using the Tegra K1 Mobile SoC
Energy consumption is an important issue for mobile devices, as the technological development in battery technology has not kept pace with the power requirements of mobile hardware. In this paper, we use a video rotation filter to study the efects of CPU and GPU frequency scaling in terms of performance and energy. Our platform is the Tegra K1 mobile processor with a quad-core CPU and a CUDA capable GPU. We find that most energy can be saved by minimising CPU frequency while meeting the filter’s framerate requirement. Interestingly, the frequency scaling affects GPUs differently, where the best frequency is always moderately higher than the minimum which meets the framerate requirement. Using these heuristics, it is possible to save up to 10 % energy compared to the standard Linux frequency scaling algorithms, which use processor utilisation to adjust processor frequency.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2015 |
Conference Name | Proceedings of the 7th ACM International Workshop on Mobile Video (MoVid) |
Pagination | 15-16 |
Date Published | 03/2015 |
Publisher | ACM |
ISBN Number | 978-1-4503-3353-5 |
URL | http://dl.acm.org/citation.cfm?id=2727044 |
DOI | 10.1145/2727040.2727044 |
Energy Efficient Video Encoding Using the Tegra K1 Mobile Processor [Demo Paper]
In Proceedings of the 6th ACM Multimedia Systems Conference (MMSys). ACM, 2015.Status: Published
Energy Efficient Video Encoding Using the Tegra K1 Mobile Processor [Demo Paper]
Energy consumption is an important concern for mobile devices, where the evolution in battery storage capacity has not followed the power usage requirements of modern hardware. However, innovative and flexible hardware platforms give developers better means of optimising the energy consumption of their software. For example, the Tegra K1 System-on-Chip (SoC) offers two CPU clusters, GPU offloading, frequency scaling and other mechanisms to control the power and performance of applications. In this demonstration, the scenario is live video encoding, and participants can experiment with power usage and performance using the Tegra K1’s hardware capabilities. A popular power-saving approach is a “race to sleep” strategy where the highest CPU frequency is used while the CPU has work to do, and then the CPU is put to sleep. Our own experiments indicate that an energy reduction of 28 % can be achieved by running the video encoder on the lowest CPU frequency at which the platform achieves an encoding frame rate equal to the minimum frame rate of 25 Frames Per Second (FPS).
Afilliation | Communication Systems, Communication Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2015 |
Conference Name | Proceedings of the 6th ACM Multimedia Systems Conference (MMSys) |
Date Published | 03/2015 |
Publisher | ACM |
ISBN Number | 978-1-4503-3351-1 |
URL | http://dl.acm.org/citation.cfm?id=2713186 |
DOI | 10.1145/2713168.2713186 |
Scaling Virtual Camera Services to a Large Number of Users
In Proceedings of the 6th annual ACM conference on Multimedia Systems (MMSYS). New York, NY, USA: ACM, 2015.Status: Published
Scaling Virtual Camera Services to a Large Number of Users
By processing video footage from a camera array, one can easily make wide-field-of-view panorama videos. From the single panorama video, one can further generate multiple virtual cameras supporting personalized views to a large number of users based on only the few physical cameras in the array. However, giving personalized services to large numbers of users potentially introduces both bandwidth and processing bottlenecks, depending on where the virtual camera is processed.
In this demonstration, we present a system that address the large cost of transmitting entire panorama video to the end-user where the user creates the virtual views on the client device. Our approach is to divide the panorama into tiles, each encoded in multiple qualities. Then, the panorama video tiles are retrieved by the client in a quality (and thus bit rate) depending on where the virtual camera is pointing, i.e., the video quality of the tile changes dynamically according to the user interaction. Our initial experiments indicate that there is a large potential of saving bandwidth on the cost of trading quality of in areas of the panorama frame not used for the extraction of the virtual view.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2015 |
Conference Name | Proceedings of the 6th annual ACM conference on Multimedia Systems (MMSYS) |
Pagination | 93-96 |
Date Published | 03/2015 |
Publisher | ACM |
Place Published | New York, NY, USA |
ISBN Number | 978-1-4503-3351-1 |
Why Race-to-Finish is Energy-Inefficient for Continuous Multimedia Workloads
In Proceedings of the 9th IEEE International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC). IEEE, 2015.Status: Published
Why Race-to-Finish is Energy-Inefficient for Continuous Multimedia Workloads
It is often believed that a "race-to-finish" approach, where processing is finished quickly, is the best way to conserve energy on modern mobile architectures. However, from earlier work we know that for continuous multimedia workloads, the best way to conserve energy is to minimise processor frequency such that application deadlines are met. In this paper, we investigate the reasons behind this. We develop an original method to model dynamic and static power on individual power rails of the Tegra K1 by only measuring the total power usage of the board. Our model has an average error of only 8 %. We find that the way an application scales performance with frequency is very important for energy efficiency. We demonstrate a 37 % energy saving by minimising processor and memory frequency of a video processing filter such that a framerate of 20 FPS is met.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2015 |
Conference Name | Proceedings of the 9th IEEE International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC) |
Pagination | 57-64 |
Date Published | 09/2015 |
Publisher | IEEE |
URL | http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=7328187 |
DOI | 10.1109/MCSoC.2015.20 |
PhD Thesis
Processing Multimedia Workloads on Heterogeneous Multicore Architectures
In University of Oslo. Vol. PhD. UiO, 2015.Status: Published
Processing Multimedia Workloads on Heterogeneous Multicore Architectures
Processor architectures have been evolving quickly since the introduction of the central processing unit. For a very long time, one of the important means of increasing performance was to increase the clock frequency. However, in the last decade, processor manufacturers have hit the so-called power wall, with high heat dissipation. To overcome this problem, processors were designed with reduced clock frequencies but with multiple cores and, later, heterogeneous processing elements. This shift introduced a new challenge for programmers: Legacy applications, written without parallelization in mind, gain no benefits from moving to multicore and heterogeneous architectures. Another challenge for the programmers is that heterogeneous architecture designs are very different with respect to caches, memory types, execution unit organization, and so forth and, in the worst case, a programmer must completely rewrite the application to obtain the best performance on the new architecture.
Multimedia workloads, such as video encoding, are often time sensitive and interactive. These workloads differ from traditional batch processing workloads with no real-time requirements. This work investigates how to
use modern heterogeneous architectures efficiently to process multimedia workloads. To do so, we investigate both simple and complex workloads on multiple architectures to learn about the properties of these architectures. When programing multimedia workloads, it is very important to know how the algorithms perform on the target architecture. In addition, achieving high performance on heterogeneous architectures is not a trivial task, often requiring detailed knowledge about the architecture. We therefore evaluate several optimizations so we can learn how best to write programs for these architectures and avoid potential pitfalls.
We later use the knowledge gained to propose a framework design and language called Parallel Processing Graph (P2G). The P2G framework is designed for multimedia workloads and supports heterogeneous architectures. To demonstrate the feasibility of the framework, we construct a proof-of-concept implementation. Two simple workloads show that we can express multimedia workloads in the system. We also demonstrate the scalability of the designed solution.
Afilliation | Communication Systems |
Project(s) | Department of Holistic Systems |
Publication Type | PhD Thesis |
Year of Publication | 2015 |
Degree awarding institution | University of Oslo |
Degree | PhD |
Date Published | 02/2015 |
Publisher | UiO |
URL | https://www.duo.uio.no/handle/10852/50618 |
Proceedings, refereed
An Evaluation of Debayering Algorithms on GPU for Real-Time Panoramic Video Recording
In IEEE International Symposium on Multimedia (ISM 2014). IEEE, 2014.Status: Published
An Evaluation of Debayering Algorithms on GPU for Real-Time Panoramic Video Recording
Modern video cameras normally only capture a single color per pixel, commonly arranged in a Bayer pattern. This means that we must restore the missing color channels in the image or the video frame in post-processing, a process referred to as debayering. In a live video scenario, this operation must be performed efficiently in order to output each frame in real-time, while also yielding acceptable visual quality. Here, we evaluate debayering algorithms implemented on a GPU for real-time panoramic video recordings using multiple 2K-resolution cameras.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2014 |
Conference Name | IEEE International Symposium on Multimedia (ISM 2014) |
Publisher | IEEE |
Automatic event extraction and video summaries from soccer games
In Proceedings of the 5th ACM Multimedia Systems Conference on - MMSys '14. New York, New York, USA: ACM Press, 2014.Status: Published
Automatic event extraction and video summaries from soccer games
Afilliation | Communication Systems, Communication Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2014 |
Conference Name | Proceedings of the 5th ACM Multimedia Systems Conference on - MMSys '14 |
Pagination | 176–179 |
Publisher | ACM Press |
Place Published | New York, New York, USA |
ISBN Number | 9781450327053 |
URL | http://dl.acm.org/citation.cfm?doid=2557642.2579374 |
DOI | 10.1145/2557642.2579374 |
Automatic Event Extraction and Video Summaries From Soccer Games
In Proceedings of the 5th annual ACM conference on Multimedia Systems (MMSYS). ACM, 2014.Status: Published
Automatic Event Extraction and Video Summaries From Soccer Games
Bagadus is a prototype of a soccer analysis application which integrates a sensor system, a video camera array and soccer analytics annotations. The current prototype is installed at Alfheim Stadium in Norway, and provides a large set of new functions compared to existing solutions. One important feature is to automatically extract video events and sum- maries from the games, i.e., an operation that traditionally consumes a huge amount of time. In this demo, we demon- strate how our integration of subsystems enable several types of summaries to be generated automatically, and we show that the video summaries are displayed with a response time around one second.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2014 |
Conference Name | Proceedings of the 5th annual ACM conference on Multimedia Systems (MMSYS) |
Date Published | March |
Publisher | ACM |
DOI | 10.1145/2557642.2579374 |
Automatic Real-Time Zooming and Panning on Salient Objects from a Panoramic Video
In Proceedings of the ACM International Conference on Multimedia - MM '14. New York, New York, USA: ACM Press, 2014.Status: Published
Automatic Real-Time Zooming and Panning on Salient Objects from a Panoramic Video
Afilliation | Communication Systems, Communication Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2014 |
Conference Name | Proceedings of the ACM International Conference on Multimedia - MM '14 |
Pagination | 725–726 |
Publisher | ACM Press |
Place Published | New York, New York, USA |
ISBN Number | 9781450330633 |
URL | http://dl.acm.org/citation.cfm?doid=2647868.2654882 |
DOI | 10.1145/2647868.2654882 |
Automatic Real-Time Zooming and Panning on Salient Objects From a Panoramic Video
In ACM International Conference on Multimedia. ACM, 2014.Status: Published
Automatic Real-Time Zooming and Panning on Salient Objects From a Panoramic Video
The proposed demo shows how our system automatically zooms and pans into tracked objects in panorama videos. At the conference site, we will set up a two-camera version of the system, generating live panorama videos, where the system zooms and pans tracking people using colored hats. Additionally, using a stored soccer game video from a five 2K camera setup at Alfheim stadium in Tromso from the European league game between Tromsø IL and Tottenham Hotspurs, the system automatically follows the ball.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2014 |
Conference Name | ACM International Conference on Multimedia |
Date Published | November |
Publisher | ACM |
Keywords | Conference |
Be your own cameraman: real-time support for zooming and panning into stored and live panoramic video
In Proceedings of the 5th ACM Multimedia Systems Conference on - MMSys '14. New York, New York, USA: ACM Press, 2014.Status: Published
Be your own cameraman: real-time support for zooming and panning into stored and live panoramic video
Afilliation | Communication Systems, Communication Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2014 |
Conference Name | Proceedings of the 5th ACM Multimedia Systems Conference on - MMSys '14 |
Pagination | 168–171 |
Publisher | ACM Press |
Place Published | New York, New York, USA |
ISBN Number | 9781450327053 |
URL | http://dl.acm.org/citation.cfm?doid=2557642.2579370 |
DOI | 10.1145/2557642.2579370 |
Be Your Own Cameraman: Real-Time Support for Zooming and Panning Into Stored and Live Panoramic Video
In Proceedings of the 5th annual ACM conference on Multimedia Systems (MMSYS). ACM, 2014.Status: Published
Be Your Own Cameraman: Real-Time Support for Zooming and Panning Into Stored and Live Panoramic Video
High-resolution panoramic video with a wide field-of-view is popular in many contexts. However, in many examples, like surveillance and sports, it is often desirable to zoom and pan into the generated video. A challenge in this respect is real-time support, but in this demo, we present an end-to-end real-time panorama system with interactive zoom and panning. Our system installed at Alfheim stadium, a Norwegian premier league soccer team, generates a cylindrical panorama from five 2K cameras live where the perspective is corrected in real-time when presented to the client. This gives a better and more natural zoom compared to existing systems using perspective panoramas and zoom operations using plain crop. Our experimental results indicate that virtual views can be generated far below the frame-rate threshold, i.e., on a GPU, the processing requirement per frame is about 10\~milliseconds. The proposed demo lets participants interactively zoom and pan into stored panorama videos generated at Alfheim stadium and from a live 2-camera array on-site.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2014 |
Conference Name | Proceedings of the 5th annual ACM conference on Multimedia Systems (MMSYS) |
Date Published | March |
Publisher | ACM |
DOI | 10.1145/2557642.2579370 |
Real-Time HDR Panorama Video
In Proceedings of the ACM International Conference on Multimedia - MM '14. New York, New York, USA: ACM Press, 2014.Status: Published
Real-Time HDR Panorama Video
Afilliation | Communication Systems, Communication Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2014 |
Conference Name | Proceedings of the ACM International Conference on Multimedia - MM '14 |
Pagination | 1205–1208 |
Publisher | ACM Press |
Place Published | New York, New York, USA |
ISBN Number | 9781450330633 |
URL | http://dl.acm.org/citation.cfm?doid=2647868.2655049 |
DOI | 10.1145/2647868.2655049 |
Real-Time HDR Panorama Video
In ACM International Conference on Multimedia, 2014.Status: Published
Real-Time HDR Panorama Video
The interest for wide field of view panorama video is increasing. In this respect, we have an application that uses an array of cameras that overlook a soccer stadium. The input of these cameras are stitched together to provide a panoramic view of the stadium. One of the challenges we face is that large parts of the field are obscured by shadows on sunny days. Such circumstances cause unsatisfying video quality. We have therefore implemented and evaluated multiple algorithms related to high dynamic range (HDR) video. The evaluation shows that a combination of several approaches gives the most useful results in our scenario.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2014 |
Conference Name | ACM International Conference on Multimedia |
Date Published | November |
Keywords | Conference |
Soccer Video and Player Position Dataset
In Proceedings of the 5th ACM Multimedia Systems Conference on - MMSys '14. New York, New York, USA: ACM Press, 2014.Status: Published
Soccer Video and Player Position Dataset
Abstract This paper presents a dataset of body-sensor traces and corresponding videos from several professional soccer games captured in late 2013 at the Alfheim Stadium in Tromsø, Norway. Player data, including field position , heading, and speed are sampled at 20Hz ... $\backslash$n
Afilliation | Communication Systems, Communication Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2014 |
Conference Name | Proceedings of the 5th ACM Multimedia Systems Conference on - MMSys '14 |
Pagination | 18–23 |
Publisher | ACM Press |
Place Published | New York, New York, USA |
ISBN Number | 9781450327053 |
URL | http://dl.acm.org/citation.cfm?doid=2557642.2563677 |
DOI | 10.1145/2557642.2563677 |
Soccer Video and Player Position Dataset
In Proceedings of the 5th annual ACM conference on Multimedia Systems (MMSYS). ACM, 2014.Status: Published
Soccer Video and Player Position Dataset
This paper presents a dataset of body-sensor traces and corresponding videos from several professional soccer games captured in late 2013 at the Alfheim Stadium in Tromsø, Norway.Player data, including field position, heading, and speed are sampled at 20 Hz using the highly accurate ZXY Sport Tracking system. Additional per-player statistics, like total distance covered and distance covered in different speed classes, are also included with a 1 Hz sampling rate. The provided videos are in high-definition and captured using two stationary camera arrays positioned at an elevated position above the tribune area close to the center of the field. The camera array is configured to cover the entire soccer field, and each camera can be used individually or as a stitched panorama video. This combination of body-sensor data and videos enables computer-vision algorithms for feature extraction, object tracking, background subtraction, and similar, to be tested against the ground truth contained in the sensor traces.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2014 |
Conference Name | Proceedings of the 5th annual ACM conference on Multimedia Systems (MMSYS) |
Publisher | ACM |
Using a Commodity Hardware Video Encoder for Interactive Video Streaming
In IEEE International Symposium on Multimedia (ISM 2014). IEEE, 2014.Status: Published
Using a Commodity Hardware Video Encoder for Interactive Video Streaming
Over the last years, video streaming has become one of the most dominant Internet services. A trend now is that due to the increased availability of high-speed internet access, multimedia services are becoming more interactive and immersive. Examples of such applications are both cloud gaming and systems where users can interact with high-resolution content. Over the last few years, hardware video encoders have been built into commodity hardware. We evaluate one of these encoders in a scenario where we have individual streams delivered to the end users. Our results show that we can reduce almost half of the CPU time spent on video processing, while also greatly reducing the power consumption on the system. We also compare the visual video quality and the frame size of the hardware based encoder, and we find no significant difference compared to a software based approach.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2014 |
Conference Name | IEEE International Symposium on Multimedia (ISM 2014) |
Publisher | IEEE |
Journal Article
Bagadus: an Integrated Real-Time System for Soccer Analytics
ACM Transactions on Multimedia Computing, Communications, and Applications 10 (2014): 14:1-14:21.Status: Published
Bagadus: an Integrated Real-Time System for Soccer Analytics
The importance of winning has increased the role of performance analysis in the sports industry, and this underscores how statistics and technology keep changing the way sports are played. Thus, this is a growing area of interest, both from a computer system view in managing the technical challenges and from a sport performance view in aiding the development of athletes. In this respect, Bagadus is a real-time prototype of a sports analytics application using soccer as a case study. Bagadus integrates a sensor system, a soccer analytics annotations system, and a video processing system using a video camera array. A prototype is currently installed at Alfheim Stadium in Norway, and in this article, we describe how the system can be used in real-time to playback events. The system supports both stitched panorama video and camera switching modes and creates video summaries based on queries to the sensor system. Moreover, we evaluate the system from a systems point of view, benchmarking different approaches, algorithms, and trade-offs, and show how the system runs in real time.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Journal Article |
Year of Publication | 2014 |
Journal | ACM Transactions on Multimedia Computing, Communications, and Applications |
Volume | 10 |
Number | 1s |
Pagination | 14:1-14:21 |
Date Published | January |
DOI | 10.1145/2541011 |
Bagadus: An integrated real-time system for soccer analytics
ACM Transactions on Multimedia Computing, Communications, and Applications 10 (2014): 1-21.Status: Published
Bagadus: An integrated real-time system for soccer analytics
Afilliation | Communication Systems, Communication Systems |
Publication Type | Journal Article |
Year of Publication | 2014 |
Journal | ACM Transactions on Multimedia Computing, Communications, and Applications |
Volume | 10 |
Pagination | 1–21 |
Publisher | ACM |
ISSN | 15516857 |
URL | http://dl.acm.org/citation.cfm?doid=2576908.2541011 |
DOI | 10.1145/2541011 |
Processing Panorama Video in Real-Time
International Journal of Semantic Computing 8 (2014): 209-227.Status: Published
Processing Panorama Video in Real-Time
There are many scenarios where high resolution, wide field of view video is useful. Such panorama video may be generated using camera arrays where the feeds from multiple cameras pointing at different parts of the captured area are stitched together. However, processing the different steps of a panorama video pipeline in real-time is challenging due to the high data rates and the stringent timeliness requirements. In our research, we use panorama video in a sport analysis system called Bagadus. This system is deployed at Alfheim stadium in Tromsø, and due to live usage, the video events must be generated in real-time. In this paper, we describe our real-time panorama system built using a low-cost CCD HD video camera array. We describe how we have implemented different components and evaluated alternatives. The performance results from experiments ran on commodity hardware with and without co-processors like graphics processing units (GPUs) show that the entire pipeline is able to run in real-time.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Journal Article |
Year of Publication | 2014 |
Journal | International Journal of Semantic Computing |
Volume | 8 |
Number | 2 |
Pagination | 209-227 |
Date Published | September |
DOI | 10.1142/S1793351X14400054 |
Processing Panorama Video in Real-time
International Journal of Semantic Computing 08 (2014): 209-227.Status: Published
Processing Panorama Video in Real-time
Afilliation | Communication Systems, Communication Systems |
Publication Type | Journal Article |
Year of Publication | 2014 |
Journal | International Journal of Semantic Computing |
Volume | 08 |
Pagination | 209–227 |
Publisher | World Scientific |
ISSN | 1793-351X |
URL | http://www.worldscientific.com/doi/abs/10.1142/S1793351X14400054 |
DOI | 10.1142/S1793351X14400054 |
Poster
Performance and Application of the NVIDIA NVENC H.264 Encoder
In GPU Technology Conference 2014. Nvidia, 2014.Status: Published
Performance and Application of the NVIDIA NVENC H.264 Encoder
This poster describes the delivery pipeline in the Bagadus soccer analysis system. The delivery pipeline takes a real-time stitched panorama video, and generates a personal virtual camera that can be controlled by the clients (end-users). An important component in this pipeline is the H.264 encoding of the personalized virtual view before delivery. By using Nvidia's NVENC hardware encoder, we are able to maintain the same visual quality as the software x264 encoder with a reduction in both CPU utilization and encode latency.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Poster |
Year of Publication | 2014 |
Secondary Title | GPU Technology Conference 2014 |
Date Published | 03/2014 |
Publisher | Nvidia |
URL | http://on-demand.gputechconf.com/gtc/2014/poster/pdf/P4188_real-time_pan... |
Proceedings, refereed
Bagadus: an Integrated System for Arena Sports Analytics - a Soccer Case Study -
In Proceedings of the 4th annual ACM conference on Multimedia Systems (MMSYS). ACM, 2013.Status: Published
Bagadus: an Integrated System for Arena Sports Analytics - a Soccer Case Study -
Sports analytics is a growing area of interest, both from a computer system view to manage the technical challenges and from a sport performance view to aid the development of athletes. In this paper, we present Bagadus, a prototype of a sports analytics application using soccer as a case study. Bagadus integrates a sensor system, a soccer analytics annotations system and a video processing system using a video camera array. A prototype is currently installed at Alfheim Stadium in Norway, and in this paper, we describe how the system can follow and zoom in on particular player(s). Next, the system will playout events from the games using stitched panorama video or camera switching mode and create video summaries based on queries to the sensor system. Further- more, we evaluate the system from a systems point of view, benchmarking different approaches, algorithms and trade-offs.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2013 |
Conference Name | Proceedings of the 4th annual ACM conference on Multimedia Systems (MMSYS) |
Pagination | 48-59 |
Date Published | March |
Publisher | ACM |
ISBN Number | 978-1-4503-1894-5 |
Demonstrating Hundreds of AIs in One Scene
In Entertainment Computing (ICEC). Vol. LNCS 8215. Springer, 2013.Status: Published
Demonstrating Hundreds of AIs in One Scene
Afilliation | Communication Systems, Communication Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2013 |
Conference Name | Entertainment Computing (ICEC) |
Volume | LNCS 8215 |
Pagination | 195-199 |
Publisher | Springer |
DOI | 10.1007/978-3-642-41106-9\_29 |
Efficient Implementation and Processing of a Real-Time Panorama Video Pipeline
In IEEE International Symposium on Multimedia (ISM 2013). IEEE, 2013.Status: Published
Efficient Implementation and Processing of a Real-Time Panorama Video Pipeline
High resolution, wide field of view video generated from multiple camera feeds has many use cases. However, processing the different steps of a panorama video pipeline in real-time is challenging due to the high data rates and the stringent requirements of timeliness. We use panorama video in a sport analysis system where video events must be generated in real-time. In this respect, we present a system for real-time panorama video generation from an array of low-cost CCD HD video cameras. We describe how we have implemented different components and evaluated alternatives. We also present performance results with and without co- processors like graphics processing units (GPUs), and we evaluate each individual component and show how the entire pipeline is able to run in real-time on commodity hardware.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2013 |
Conference Name | IEEE International Symposium on Multimedia (ISM 2013) |
Pagination | 76-83 |
Publisher | IEEE |
Keywords | Conference |
Poster
Real-time Panorama Video Processing Using NVIDIA GPUs
In GPU Tecnology Conference 2013. Nvidia, 2013.Status: Published
Real-time Panorama Video Processing Using NVIDIA GPUs
Sports analytics is a growing area of interest, both from a computer system view to manage the technical challenges and from a sport performance view to aid the development of athletes. We have been working on Bagadus, a prototype of a sports analytics application using soccer as a case study. Bagadus integrates a sensor system, a soccer analytics annotationssystem and a video processing system using a video camera array. A prototype is currently installed at Alfheim Stadium in Norway. An important part of the system is playback of events from the games using stitched panorama video. This results in a lot of technical challenges to keep the creation of these panorama videos in real time. To be able to do this, we utilize the power of GPGPU by use of NVIDIA GPUs and CUDA.
Link to the 2013 GPU Technology Conference poster program: http://www.gputechconf.com/page/posters.html\#vidimg
Afilliation | Communication Systems, Communication Systems |
Publication Type | Poster |
Year of Publication | 2013 |
Secondary Title | GPU Tecnology Conference 2013 |
Date Published | 03/2013 |
Publisher | Nvidia |
URL | http://on-demand.gputechconf.com/gtc/2013/poster/pdf/P0201_MariusTennoe.pdf |
Proceedings, refereed
BAGADUS: an Integrated System for Soccer Analysis
In Proceedings of the International Conference on Distributed Smart Cameras (ICDSC). ACM/IEEE, 2012.Status: Published
BAGADUS: an Integrated System for Soccer Analysis
In this demo, we present Bagadus, a prototype of a soccer analysis application which integrates a sensor system, soccer analytics annotations and video processing of a video camera array. The prototype is currently installed at Alfheim Stadium in Norway, and we demonstrate how the system can follow and zoom in on particular player(s), and playout events from the games using the stitched panorama video and/or the camera switching mode.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2012 |
Conference Name | Proceedings of the International Conference on Distributed Smart Cameras (ICDSC) |
Date Published | October/November |
Publisher | ACM/IEEE |
LEARS: a Lockless, Relaxed-Atomicity State Model for Parallel Execution of a Game Server Partition
In The 41st International Conference on Parallel Processing Workshops. IACC, 2012.Status: Published
LEARS: a Lockless, Relaxed-Atomicity State Model for Parallel Execution of a Game Server Partition
Supporting thousands of interacting players in a virtual world poses huge challenges with respect to processing. Existing work that addresses the challenge utilizes a variety of spatial partitioning algorithms to distribute the load. If, however, a large number of players needs to interact tightly across an area of the game world, spatial partitioning cannot subdivide this area without incurring massive communication costs, latency or inconsistency. It is a major challenge of game engines to scale such areas to the largest number of players possible; in a deviation from earlier thinking, parallelism on multi-core architectures is applied to increase scalability. In this paper, we evaluate the design and implementation of our game server architecture, called LEARS, which allows for lock-free parallel processing of a single spatial partition by considering every game cycle an atomic tick. Our prototype is evaluated using traces from live game sessions where we measure the server response time for all objects that need timely updates. We also measure how the response time for the multi-threaded implementation varies with the number of threads used. Our results show that the challenge of scaling up a game-server can be an embarrassingly parallel problem.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2012 |
Conference Name | The 41st International Conference on Parallel Processing Workshops |
Publisher | IACC |
ISBN Number | 978-0-7695-4795-4 |
Notes | Published as part of the SRMPDS workshop proceedings |
DOI | 10.1109/ICPPW.2012.55 |
Journal Article
Reducing Processing Demands for Multi-Rate Video Encoding: Implementation and Evaluation
International Journal of Multimedia Data Engineering and Management 3 (2012): 1-19.Status: Published
Reducing Processing Demands for Multi-Rate Video Encoding: Implementation and Evaluation
Segmented adaptive HTTP streaming has become the de facto standard for video delivery over the Internet for its ability to scale video quality to the available network resources. Here, each video is encoded in multiple qualities, i.e., running the expensive encoding process for each quality layer. However, these operations consume both a lot of time and resources, and in this paper, the authors propose a system for reusing redundant steps in a video encoder to improve the multi-layer encoding pipeline. The idea is to have multiple outputs for each of the target bitrates and qualities where the intermediate processing steps share and reuse the computational heavy analysis. A prototype has been implemented using the VP8 reference encoder, and their experimental results show that for both low- and high-resolution videos the proposed method can significantly reduce the processing demands and time when encoding the different quality layers.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Journal Article |
Year of Publication | 2012 |
Journal | International Journal of Multimedia Data Engineering and Management |
Volume | 3 |
Number | 2 |
Pagination | 1-19 |
DOI | 10.4018/jmdem.2012040101 |
Proceedings, refereed
A Demonstration of a Lockless, Relaxed Atomicity State Parallel Game Server (LEARS)
In Workshop on Network and Systems Support for Games (NetGames 2011). IEEE / ACM, 2011.Status: Published
A Demonstration of a Lockless, Relaxed Atomicity State Parallel Game Server (LEARS)
Games where thousands of players can interact concurrently pose many challenges with regards to the massive parallelism. Earlier work within the field suggests that this is difficult due to synchronization issues. In this paper, we present an implementation of a game server architecture based on a model that allows for massive parallelism. The system is evaluated using traces from live game sessions that has been scaled up to generate massive workloads. We measure the differences in server response time for all objects that need timely updates. We also measure how the response time for the multithreaded implementation varies with the number of threads used. Our results show that the case of implementing a game-server can actually be highly parallel problem.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2011 |
Conference Name | Workshop on Network and Systems Support for Games (NetGames 2011) |
Pagination | 1-3 |
Publisher | IEEE / ACM |
ISBN Number | 978-1-4577-1932-5 |
Improved Multi-Rate Video Encoding
In International Symposium on Multimedia. IEEE, 2011.Status: Published
Improved Multi-Rate Video Encoding
Adaptive HTTP streaming is frequently used for both live and on-Demand video delivery over the Internet. Adaptiveness is often achieved by encoding the video stream in multiple qualities (and thus bitrates), and then transparently switching between the qualities according to the bandwidth fluctuations and the amount of resources available for decoding the video content on the end device. For this kind of video delivery over the Internet, H.264 is currently the most used codec, but VP8 is an emerging open-source codec expected to compete with H.264 in the streaming scenario. The challenge is that, when encoding video for adaptive video streaming, both VP8 and H.264 run once for each quality layer, i.e., consuming both time and resources, especially important in a live video delivery scenario. In this paper, we address the resource consumption issues by proposing a method for reusing redundant steps in a video encoder, emitting multiple outputs with varying bitrates and qualities. It shares and reuses the computational heavy analysis step, notably macro-block mode decision, intra prediction and inter prediction between the instances, and outputs video in several rates. The method has been implemented in the VP8 reference encoder, and experimental results show that we can encode the different quality layers at the same rates and qualities compared to the VP8 reference encoder, while reducing the encoding time significantly.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2011 |
Conference Name | International Symposium on Multimedia |
Pagination | 293-300 |
Publisher | IEEE |
ISBN Number | 978-0-7695-4589-9 |
P2G: a Framework for Distributed Real-Time Processing of Multimedia Data
In Proceedings of the International Workshop on Scheduling and Resource Management for Parallel and Distributed Systems (SRMPDS) - The 2011 International Conference on Parallel Processing Workshops. IEEE, 2011.Status: Published
P2G: a Framework for Distributed Real-Time Processing of Multimedia Data
The computational demands of multimedia data processing are steadily increasing as consumers call for progressively more complex and intelligent multimedia services. New multi-core hardware architectures provide the required resources, but writing parallel, distributed applications remains a labor-intensive task compared to their sequential counter-part. For this reason, Google and Microsoft implemented their respective processing frameworks MapReduce, as they allow the developer to think sequentially, yet benefit from parallel and distributed execution. An inherent limitation in the design of these processing frameworks is their inability to express arbitrarily complex workloads. The dependency graphs of the frameworks are often limited to directed acyclic graphs, or even pre-determined stages. This is particularly problematic for video encoding and other algorithms that depend on iterative execution. With the Nornir runtime system for parallel programs, which is a Kahn Process Network implementation, we addressed and solved several of these limitations. However, it is more difficult to use than other frameworks due to its complex programming model. In this paper, we build on the knowledge gained from Nornir and present a new framework, called , designed specifically for developing and processing distributed real-time multimedia data. P2G supports arbitrarily complex dependency graphs with cycles, branches and deadlines, and provides both data- and task-parallelism. The framework is implemented to scale transparently with available (heterogeneous) resources, a concept familiar from the cloud computing paradigm. We have implemented an (interchangeable) P2G to ease development. In this paper, we present a proof of concept implementation of a P2G execution node and some experimental examples using complex workloads like Motion JPEG and K-means clustering. The results show that the P2G system is a feasible approach to multimedia processing.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2011 |
Conference Name | Proceedings of the International Workshop on Scheduling and Resource Management for Parallel and Distributed Systems (SRMPDS) - The 2011 International Conference on Parallel Processing Workshops |
Pagination | 416-426 |
Date Published | September |
Publisher | IEEE |
ISBN Number | 978-0-7695-4511-0 |
Processing of Multimedia Data Using the P2G Framework
In Proceedings of the 19th ACM international conference on Multimedia. ACM, 2011.Status: Published
Processing of Multimedia Data Using the P2G Framework
In this demo, we present the P2G framework designed for processing distributed real-time multimedia data. P2G supports arbitrarily complex dependency graphs with cycles, branches and deadlines. P2G is implemented to scale transparently with available resources, i.e., a concept familiar from the cloud computing paradigm. Additionally, P2G supports heterogeneous computing resources, such as x86 and GPU processing cores. We have implemented an interchangeable P2G kernel language which is meant to expose fundamental concepts of the P2G programming model and ease the application development. Here, we demonstrate the P2G execution node using a MJPEG encoder as an example workload when dynamically adding and removing processing cores.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2011 |
Conference Name | Proceedings of the 19th ACM international conference on Multimedia |
Pagination | 819-820 |
Publisher | ACM |
ISBN Number | 978-1-4503-0616-4 |
Poster
Distributed Real-Time Processing of Multimedia Data With the P2G Framework
2011.Status: Published
Distributed Real-Time Processing of Multimedia Data With the P2G Framework
P2G is a framework designed to integrate concepts from modern batch processing frameworks into the world of real-time multimedia processing, where we seek to scale transparently with the available resources. P2G consists of a compiler and run-time that analyzes dependencies dynamically and merges or splits kernel instances based on resouce availability and performance monitoring.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Poster |
Year of Publication | 2011 |
Date Published | April |
Proceedings, refereed
Cheat Detection Processing: a GPU Versus CPU Comparison
In Workshop on Network and Systems Support for Games (NetGames 2010). ACM IEEE, 2010.Status: Published
Cheat Detection Processing: a GPU Versus CPU Comparison
In modern online multi-player games, game providers are struggling to keep up with the many different types of cheating. Cheat detection is a task that requires a lot of computational resources. Advances made within the field of heterogeneous computing architectures, such as graphics processing units (GPUs), have given developers easier access to considerably more computational resources, enabling a new approach to solving this issue. In this paper, we have developed a small game simulator that includes a customizable physics engine and a cheat detection mechanism that checks the physical model used by the game. To make sure that the mechanisms are fair to all players, they are executed on the server side of the game system. We investigate the advantages of implementing physics cheat detection mechanisms on a GPU using the Nvidia CUDA framework, and we compare the GPU implementation of the cheat detection mechanism with a CPU implementation. The results obtained from the simulations show that offloading the cheat detection mechanisms to the GPU reduces the time spent on cheat detection, enabling the servers to support a larger number of clients.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2010 |
Conference Name | Workshop on Network and Systems Support for Games (NetGames 2010) |
Pagination | 8:1-8:6 |
Publisher | ACM IEEE |
ISBN Number | 978-1-4244-8355-6 |
Tips, Tricks and Troubles: Optimizing for Cell and GPU
In The 20th International Workshop on Network and Operating Systems Support for Digital Audio and Video (NOSSDAV 2010). ACM, 2010.Status: Published
Tips, Tricks and Troubles: Optimizing for Cell and GPU
When used efficiently, modern multicore architectures, such as Cell and GPUs, provide the processing power required by resource demanding multimedia workloads. However, the diversity of resources exposed to the programmers, intrinsically requires specific mindsets for efficiently utilizing these resources - not only compared to an x86 architecture, but also between the Cell and the GPUs. In this context, our analysis of 14 different Motion-JPEG implementations indicates that there exists a large potential for optimizing performance, but there are also many pitfalls to avoid. By experimentally evaluating algorithmic choices, inter-core data communication (memory transfers) and architecture-specific capabilities, such as instruction sets, we present tips, tricks and troubles with respect to efficient utilization of the available resources.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2010 |
Conference Name | The 20th International Workshop on Network and Operating Systems Support for Digital Audio and Video (NOSSDAV 2010) |
Pagination | 75-80 |
Date Published | June |
Publisher | ACM |
ISBN Number | 978-1-4503-0043-8 |
Talks, contributed
Temming Av Multikjerneprosessorer - Fordeler Og Utfordringer
In The Gathering World & Pegasus, 2010.Status: Published
Temming Av Multikjerneprosessorer - Fordeler Og Utfordringer
De fleste datamaskiner har i dag en multikjerneprosessor. Kanskje du også har et grafikkort i datamaskinen din, eller kanskje du har en Playstation 3? Da har du også en asymmetrisk multikjerneprosessor - som stort sett bare utnyttes når du spiller spill!
Afilliation | Communication Systems, Communication Systems |
Publication Type | Talks, contributed |
Year of Publication | 2010 |
Location of Talk | The Gathering World & Pegasus |
Proceedings, refereed
Improving Disk I/O Performance on Linux
In UpTimes - Proceedings of Linux-Kongress and OpenSolaris Developer Conference 2009. German Unix User Group, 2009.Status: Published
Improving Disk I/O Performance on Linux
The existing Linux disk schedulers are in general efficient, but we have identified two scenarios where we have observed a non-optimal behavior. The first is when an application requires a fixed bandwidth, and the second is when an operation performs a file tree traversal. In this paper, we address both these scenarios and propose solutions which both increase performance.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2009 |
Conference Name | UpTimes - Proceedings of Linux-Kongress and OpenSolaris Developer Conference 2009 |
Pagination | 61-70 |
Date Published | October |
Publisher | German Unix User Group |
ISBN Number | 978-3-86541-358-1 |
Improving File Tree Traversal Performance by Scheduling I/O Operations in User Space
In Proceedings of the 28th IEEE International Performance Computing and Communications Conference (IPCCC). IEEE, 2009.Status: Published
Improving File Tree Traversal Performance by Scheduling I/O Operations in User Space
Current in-kernel disk schedulers provide efficient means to optimize the order (and minimize disk seeks) of issued, in-queue I/O requests. However, they fail to optimize sequential multi-file operations, like traversing a large file tree, because only requests from one file are available in the scheduling queue at a time. We have therefore investigated a user-level, I/O request sorting approach to reduce inter-file disk arm movements. This is achieved by allowing applications to utilize the placement of inodes and disk blocks to make a one sweep schedule for all file I/Os requested by a process, i.e., data placement information is read first before issuing the low-level I/O requests to the storage system. Our experiments with a modified version of tar show reduced disk arm movements and large performance improvements.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2009 |
Conference Name | Proceedings of the 28th IEEE International Performance Computing and Communications Conference (IPCCC) |
Pagination | 145-152 |
Publisher | IEEE |
ISBN Number | 978-1-4244-5736-6 |
Proceedings, refereed
Evaluation of Multi-Core Scheduling Mechanisms for Heterogeneous Processing Architectures
In Network and Operating System Support for Digital Audio and Video (NOSSDAV 2008). ACM, 2008.Status: Published
Evaluation of Multi-Core Scheduling Mechanisms for Heterogeneous Processing Architectures
General-purpose CPUs with multiple cores are established products, and new heterogeneous technology like the Cell broadband engine and general-purpose GPUs bring an even higher degree of true multi-processing into the market. However, means for utilizing the processing power is immature. Current tools typically assume that exclusive use of these resources is sufficient, but this assumption will soon be invalid because the interest in using their processing power for general-purpose tasks. Among the applications that can benefit from such technology is transcoding support for distributed media applications, where remote participants join and leave dynamically. Transcoding consists of several clearly separated processing operations that consume a lot of resources, such that individual processing units are unable to handle all operations of a session of arbitrary size. The individual operations can then be distributed over several processing units, and data must be moved between them according to the dependencies between operations. Many multi-processor scheduling approaches exist, but to the best of our knowledge, a challenge is still to find mechanisms that can schedule dynamic workloads of communicating operations while taking both the processing and communication requirements into account. For such applications, we believe that feasible scheduling can be performed in two levels, i.e., divided into the task of placing a job onto a processing unit and the task of multitasking time-slices within a single processing unit. We have implemented some simple high-level scheduling mechanisms and simulated a video conferencing scenario running on topologies inspired by existing systems from Intel, AMD, IBM and nVidia. Our results show the importance of using an efficient high-level scheduler.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2008 |
Conference Name | Network and Operating System Support for Digital Audio and Video (NOSSDAV 2008) |
Pagination | 33-38 |
Date Published | May |
Publisher | ACM |
ISBN Number | 978-1-60588-157-6 |
Making an SCI Fabric Dynamically Fault Tolerant
In Workshop on Communication Architecture for Clusters (CAC 2008). IEEE, 2008.Status: Published
Making an SCI Fabric Dynamically Fault Tolerant
In this paper we present a method for dynamic fault tolerant routing for SCI networks implemented on Dolphin Interconnect Solutions hardware. By dynamic fault tolerance, we mean that the interconnection network reroutes affected packets around a fault, while the rest of the network is fully functional. To the best of our knowledge this is the first reported case of dynamic fault tolerant routing available on commercial off the shelf interconnection network technology without duplicating hardware resources. The development is focused around a 2-D torus topology, and is compatible with the existing hardware, and software stack. We look into the existing mechanisms for routing in SCI. We describe how to make the nodes that detect the faulty component do routing decisions, and what changes are needed in the existing routing to enable support for local rerouting. The new routing algorithm is tested on clusters with real hardware. Our tests show that distributed databases like MySQL can run uninterruptedly while the network reacts to faults. The solution is now part of Dolphin Interconnect Solutions SCI driver, and hardware development to further decrease the reaction time is underway.
Afilliation | Communication Systems, , Communication Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2008 |
Conference Name | Workshop on Communication Architecture for Clusters (CAC 2008) |
Pagination | 1-8 |
Date Published | April |
Publisher | IEEE |
ISBN Number | 9781424416936 |
Transparent Protocol Translation and Load Balancing on a Network Processor in a Media Streaming Scenario
In Network and Operating System Support for Digital Audio and Video (NOSSDAV 2008). ACM, 2008.Status: Published
Transparent Protocol Translation and Load Balancing on a Network Processor in a Media Streaming Scenario
Today, major newspapers and TV stations make live and on-demand audio/video content available, video-on-demand services are becoming common and even personal media are frequently uploaded to streaming sites. The discussion about the best transport protocol for streaming has been going on for years. Currently, HTTP-streaming is usual although the transport of streaming media data over TCP is hindered by TCP's probing behavior, which results in the rapid reduction and slow recovery of the packet rates. On the other hand, UDP has been criticized for being unfair against TCP, and it is therefore often blocked by access network providers. To exploit benefits of both TCP and UDP, we have implemented a proxy that performs transparent protocol translation in such a way that the video stream is delivered to clients in a TCP-compatible and TCP-friendly way, but with UDP-like smoothness. The translation is related to multicast-to-unicast translation and to voice-over-IP proxies that translate between UDP and TCP. Furthermore, it is also similar to the use of proxy caching that ISPs employ to reduce bandwidth demands. The unique advantage of our approach is that we avoid full-featured TCP handling on the proxy server but still achieve live protocol translation at line-speed in a TCP-compliant, TCP-friendly manner. Although we discard packets just like a sender of non-adaptive video over TCP, we achieve higher user-perceived quality because our proxy can avoid receive queue underflows in the proxy, while also achieving the same average bandwidth as a TCP connection between proxy and client. In this demo, we present our prototype implemented on an Intel IXP2400 network processor. The prototype proxy does not buffer outgoing packets, yielding data loss in case of a congested TCP side. Comparing HTTP-streaming from a web-server and RTP/UDP-streaming from a video server shows that, in case of some loss, our solution using UDP from the server and a proxy that translates to TCP delivers a smoother stream at playout rate while the end-to-end TCP stream oscillates heavily.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2008 |
Conference Name | Network and Operating System Support for Digital Audio and Video (NOSSDAV 2008) |
Pagination | 129-130 |
Date Published | May |
Publisher | ACM |
ISBN Number | 978-1-60588-157-6 |
Proceedings, refereed
Transparent Protocol Translation for Streaming
In ACM International Multimedia Conference (ACM MM). ACM, 2007.Status: Published
Transparent Protocol Translation for Streaming
The transport of streaming media data over TCP is hindered by TCP's probing behavior that results in the rapid reduction and slow recovery of the packet rates. On the other side, UDP has been criticized for being unfair against TCP connections, and it is therefore often blocked out in the access networks. In this paper, we try to benefit from a combined approach using a proxy that transparently performs transport protocol translation. We translate HTTP requests by the client transparently into RTSP requests, and translate the corresponding RTP/UDP/AVP stream into the corresponding HTTP response. This enables the server to use UDP on the server side and TCP on the client side. This is beneficial for the server side that scales to a higher load when it doesn't have to deal with TCP. On the client side, streaming over TCP has the advantage that connections can be established from the client side, and data streams are passed through firewalls. Preliminary tests demonstrate that our protocol translation delivers a smoother stream compared to HTTP-streaming where the TCP bandwidth oscillates heavily.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2007 |
Conference Name | ACM International Multimedia Conference (ACM MM) |
Pagination | 771-774 |
Date Published | September |
Publisher | ACM |
ISBN Number | 978-1-59593-702-5 |
Notes | (short paper) © ACM, (2007). This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in Proceedings of the 15th international conference on Multimedia (2007), http://doi.acm.org/10.1145/1291233.1291407 |
Master's thesis
Fault-Tolerant Routing in SCI Networks
University of Oslo, 2006.Status: Accepted
Fault-Tolerant Routing in SCI Networks
Fault-tolerant routing has been a hot topic in the academic community for quite some time now, and several different approaches have been suggested. In the interconnect industry however, fault-tolerant routing has not been implemented to the same extent. In this thesis we have adapted and implemented a local fault-tolerant routing approach in SCI interconnect technology produced by Dolphin Interconnect Solutions. The existing technology used in SCI is based in a static reconfiguration approach, where the traffic is disabled, while the new routing is calculated by a central front-end and distributed out to the nodes. Our algorithm builds upon the principle of enabling the nodes to make routing decisions from the information that is available to them locally, and having the rest of the nodes in the cluster to be prepared for this unexpected traffic. The algorithm has been tested on real hardware, and we have shown that it can handle several levels of traffic in the network. The test has also proven that our method gives the same performance both before and after the error occurs if the packets have the same conditions, such as competing traffic and link length. Our routing algorithm is currently integrated as a part of Dolphin Interconnect Solutions driver in the last official release.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Master's thesis |
Year of Publication | 2006 |
Date Published | August |
Publisher | University of Oslo |