Projects
POPART: Previz for On-set Production - Adaptive Realtime Tracking

Tomorrow’s technology for visual effects will be developed in Norway, in collaboration with the leaders of the European film industry, as a result of the EU-project POPART. In modern film production, animated effects are used extensively, to the degree that the audience now expects this as an aspect of large Hollywood productions. In European contexts, animation is used considerably less, due to the smaller budgets, and the fact that digital effects are associated with high risk investments and sizable expenses. Through the EU-project POPART we are developing a product where it will be possible to preview digital effects on set, and POPART further aims to heighten the efficiency of postproduction work considerably.
Funding source:
The project budget is € 1 300 000, of which the European Union contributes approximately € 1 000 000 through the ICT-18 call in the LEIT program of Horizon 2020. The industry partners of the EU project provide the rest of the funding.
Publications for POPART: Previz for On-set Production - Adaptive Realtime Tracking
Journal Article
Third Life Project
Fresh Perspective (2017): 41-47.Status: Published
Third Life Project
The Third Life Project told us three lessons:
- Staging a real-time video game in a theatrical performance brings up for question and re-examination what is tangible and actual and what is immaterial and abstract.
- Interactions in virtual environments that are grounded also in physical world enhance intuition of both performers and spectators.
- Regular online video conferencing meetings afford numerous opportunities to establish the trust and reciprocal understanding, and the respect for different goals, practices, expertise and rhythms of work that are all together necessary for a rewarding interdisciplinary collaboration.
Afilliation | Communication Systems |
Project(s) | POPART: Previz for On-set Production - Adaptive Realtime Tracking, LADIO: Live Action Data Input/Output |
Publication Type | Journal Article |
Year of Publication | 2017 |
Journal | Fresh Perspective |
Number | 6 |
Pagination | 41-47 |
Date Published | 03/2017 |
Publisher | IETM - International network for contemporary performing arts |
Place Published | Brussels, Belgium |
ISBN Number | 978-2-930897-13-4 |
URL | https://www.ietm.org/en/system/files/publications/ietm_fp_mixed-reality_... |
Proceedings, refereed
Detection and Accurate Localization of Circular Fiducials under Highly Challenging Conditions
In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Vol. 00. Las Vegas, NV, USA: IEEE, 2016.Status: Published
Detection and Accurate Localization of Circular Fiducials under Highly Challenging Conditions
Using fiducial markers ensures reliable detection and identification of planar features in images. Fiducials are used in a wide range of applications, especially when a reliable visual reference is needed, e.g., to track the camera in cluttered or textureless environments. A marker designed for such applications must be robust to partial occlusions, varying distances and angles of view, and fast camera motions. In this paper, we present a robust, highly accurate fiducial system, whose markers consist of concentric rings, along with its theoretical foundations. Relying on projective properties, it allows to robustly localize the imaged marker and to accurately detect the position of the image of the (common) circle center. We demonstrate that our system can detect and accurately localize these circular fiducials under very challenging conditions and the experimental results reveal that it outperforms other recent fiducial systems.
Afilliation | Communication Systems |
Project(s) | POPART: Previz for On-set Production - Adaptive Realtime Tracking |
Publication Type | Proceedings, refereed |
Year of Publication | 2016 |
Conference Name | 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) |
Volume | 00 |
Pagination | 562 - 570 |
Publisher | IEEE |
Place Published | Las Vegas, NV, USA |
ISSN Number | 1063-6919 |
URL | http://ieeexplore.ieee.org/document/7780436/http://xplorestaging.ieee.or... |
DOI | 10.1109/CVPR.2016.67 |
Publications
Journal Article
Efficient Live and On-Demand Tiled HEVC 360 VR Video Streaming
International Journal of Semantic Computing 13, no. 3 (2019): 367-391.Status: Published
Efficient Live and On-Demand Tiled HEVC 360 VR Video Streaming
360 panorama video displayed through Virtual reality (VR) glasses or large screens o®ers immersive user experiences, but as such technology becomes commonplace, the need for e±cient streaming methods of such high-bitrate videos arises. In this respect, the attention that 360panorama video has received lately is huge. Many methods have already been proposed, and in this paper, we shed more light on the di®erent trade-o®s in order to save bandwidth while preserving the video quality in the user's ̄eld-of-view (FoV). Using 360 VR content delivered to a Gear VR head-mounted display with a Samsung Galaxy S7 and to a Huawei Q22 set-top- box, we have tested various tiling schemes analyzing the tile layout, the tiling and encoding overheads, mechanisms for faster quality switching beyond the DASH segment boundaries and quality selection con ̄gurations. In this paper, we present an e±cient end-to-end design and real-world implementation of such a 360 streaming system. Furthermore, in addition to researching an on-demand system, we also go beyond the existing on-demand solutions and present a live streaming system which strikes a trade-o® between bandwidth usage and the video quality in the user's FoV. We have created an architecture that combines RTP and DASH, and our system multiplexes a single HEVC hardware decoder to provide faster quality switching than at the traditional GOP boundaries. We demonstrate the performance and illustrate the trade-o®s through real-world experiments where we can report comparable bandwidth savings to existing on-demand approaches, but with faster quality switches when the FoV changes.
Afilliation | Communication Systems, Machine Learning |
Project(s) | No Simula project, Department of Holistic Systems |
Publication Type | Journal Article |
Year of Publication | 2019 |
Journal | International Journal of Semantic Computing |
Volume | 13 |
Issue | 3 |
Number | 3 |
Pagination | 367-391 |
Publisher | World Scientific |
Talk, keynote
Automatic Detection of Angiectasia: Evaluation of Deep Learning and Handcrafted Approaches
In IEEE Conference on Biomedical and Health Informatics (BHI) 2018, 2018.Status: Published
Automatic Detection of Angiectasia: Evaluation of Deep Learning and Handcrafted Approaches
Angiectasia, formerly called angiodysplasia, is one of the most frequent vascular lesions and often the cause of gastrointestinal bleedings. Medical specialists assessing videos of examinations reach a detection performance of 16% for the detection of bleeding to 69% for the detection of angiectasia. In this paper, we present several machine-learning-based approaches for angiectasia detection in wireless video capsule endoscopy images. The most promising results for pixel-wise localization and frame-wise detection are obtained by the proposed deep learning approach using generative adversarial networks (GANs) with a sensitivity of 88% and specificity of 99.9% for pixel-wise localization and a sensitivity of 98% and a specificity of 100% for frame-wise detection, which fits the requirements for automatic angiectasia detection in real clinical settings.
Afilliation | Communication Systems |
Project(s) | Efficient EONS: Execution of Large Workloads on Elastic Heterogeneous Resources, Department of Holistic Systems |
Publication Type | Talk, keynote |
Year of Publication | 2018 |
Location of Talk | IEEE Conference on Biomedical and Health Informatics (BHI) 2018 |
Book Chapter
Camera Synchronization for Panoramic Videos
In MediaSync, 565-592. Springer, 2018.Status: Published
Camera Synchronization for Panoramic Videos
Afilliation | Communication Systems |
Project(s) | Efficient EONS: Execution of Large Workloads on Elastic Heterogeneous Resources, Department of Holistic Systems |
Publication Type | Book Chapter |
Year of Publication | 2018 |
Book Title | MediaSync |
Pagination | 565-592 |
Date Published | 03/2018 |
Publisher | Springer |
URL | https://doi.org/10.1007/978-3-319-65840-7_20 |
DOI | 10.1007/978-3-319-65840-7_20 |
Proceedings, refereed
Deep Learning and Hand-crafted Feature Based Approaches for Polyp Detection in Medical Videos
In 31st IEEE CBMS International Symposium on Computer-Based Medical Systems. Karlstad, Sweden: IEEE, 2018.Status: Published
Deep Learning and Hand-crafted Feature Based Approaches for Polyp Detection in Medical Videos
Afilliation | Communication Systems |
Project(s) | Efficient EONS: Execution of Large Workloads on Elastic Heterogeneous Resources |
Publication Type | Proceedings, refereed |
Year of Publication | 2018 |
Conference Name | 31st IEEE CBMS International Symposium on Computer-Based Medical Systems |
Pagination | 381-386 |
Publisher | IEEE |
Place Published | Karlstad, Sweden |
ISSN Number | 2372-9198 |
DOI | 10.1109/CBMS.2018.00073 |
Deep Learning and Handcrafted Feature Based Approaches for Automatic Detection of Angiectasia
In 2018 IEEE Conference on Biomedical and Health Informatics (BHI). IEEE, 2018.Status: Published
Deep Learning and Handcrafted Feature Based Approaches for Automatic Detection of Angiectasia
Angiectasia, formerly called angiodysplasia, is one of the most frequent vascular lesions and often the cause of gastrointestinal bleedings. Medical specialists assessing videos or images of examinations reach a detection performance of 16% for the detection of bleeding to 69% for the detection of angiectasia. This shows that automatic detection to support medical experts can be useful. In this paper, we present several machine learning-based approaches for angiectasia detection in wireless video capsule endoscopy frames. In summary, the most promising results for pixel-wise localization and framewise detection are obtained by the proposed deep learning method using generative adversarial networks (GANs). Using this approach, we achieve a sensitivity of 88% and specificity of 99.9% for pixel-wise localization, and a sensitivity of 98% and a specificity of 100% for frame-wise detection. Thus, the results demonstrate the capability of using deep learning for automatic angiectasia detection in real clinical settings.
Afilliation | Communication Systems |
Project(s) | Efficient EONS: Execution of Large Workloads on Elastic Heterogeneous Resources |
Publication Type | Proceedings, refereed |
Year of Publication | 2018 |
Conference Name | 2018 IEEE Conference on Biomedical and Health Informatics (BHI) |
Pagination | 365-368 |
Publisher | IEEE |
Keywords | Angiectasia, computer aided diagnosis, deep learning, Machine learning, video capsular endoscopy |
DOI | 10.1109/BHI.2018.8333444 |
Efficient Live and on-Demand Tiled HEVC 360 VR Video Streaming
In 2018 IEEE International Symposium on Multimedia (ISM). Taichung, Taiwan: IEEE, 2018.Status: Published
Efficient Live and on-Demand Tiled HEVC 360 VR Video Streaming
With 360◦ panorama video technology becoming commonplace, the need for efficient streaming methods for such videos arises. We go beyond the existing on-demand solutions and present a live streaming system which strikes a trade-off between bandwidth usage and the video quality in the user’s field-of-view. We have created an architecture that combines RTP and DASH to deliver 360◦ VR content to a Huawei set-top-box and a Samsung Galaxy S7. Our system multiplexes a single HEVC hardware decoder to provide faster quality switching than at the traditional GOP boundaries. We demonstrate the performance and illustrate the trade-offs through real-world experiments where we can report comparable bandwidth savings to existing on-demand approaches, but with faster quality switches when the field-of- view changes.
Afilliation | Communication Systems, Machine Learning |
Project(s) | No Simula project, Department of Holistic Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2018 |
Conference Name | 2018 IEEE International Symposium on Multimedia (ISM) |
Pagination | 81-88 |
Date Published | 12/2018 |
Publisher | IEEE |
Place Published | Taichung, Taiwan |
DOI | 10.1109/ISM.2018.00022 |
Tradeoffs using Binary and Multiclass Neural Network Classification for Medical Multidisease Detection
In 2018 IEEE International Symposium on Multimedia (ISM). IEEE, 2018.Status: Published
Tradeoffs using Binary and Multiclass Neural Network Classification for Medical Multidisease Detection
The interest in neural networks has increased sig- nificantly, and the application of this type of machine learning is vast, ranging from natural image classification to medical image segmentation. However, many users of neural networks tend to use them as a black box tool. They do not access all of the possible variations, nor take into account the respective classification accuracies and costs. In our work, we focus on multiclass image classification, and in this research, we shed light on the trade-offs between systems using a single multiclass classification and multiple binary classifiers, respectively. We have tested the these classifiers on several modern neural network architectures, including DenseNet, Inception v3, Inception ResNet v2, Xception, NASNet and MobileNet. We have compared several aspects of the performance of these architectures during training and testing using both classification styles. We have compared classification speed and several classification accuracy metrics. Here, we present the results from experiments on a total of 99 networks: 11 multiclass and 88 individual binary networks, for an 8-class classification of medical images. In short, using multiple binary classification networks resulted in a 7% increase in the average F1 score, a 1% increase in average accuracy, a 1% increase in precision, and a 4% increase in average recall. However, on average, such a multi-network style performed the classification 7.6 times slower compared to a single network multiclass implementation. These collective findings show that both approaches can be applied to modern neural network structures. Several binary networks will often give increased classification accuracy, but at the cost of classification speed and resource consumption.
Afilliation | Communication Systems, Machine Learning |
Project(s) | No Simula project |
Publication Type | Proceedings, refereed |
Year of Publication | 2018 |
Conference Name | 2018 IEEE International Symposium on Multimedia (ISM) |
Pagination | 1-8 |
Date Published | 12/2018 |
Publisher | IEEE |
DOI | 10.1109/ISM.2018.00009 |
PhD Thesis
Processing Cyclic Multimedia Workloads on Modern Architectures
University of Oslo, 2014.Status: Published
Processing Cyclic Multimedia Workloads on Modern Architectures
Working with modern architectures for high performance applications is increasingly more difficult for programmers as the complexity of both the system architectures and software continue to increase. The level of hand tuning and native adaptations required to achieve high performance comes at the cost of limiting the portability of the software. For instance, we show that a compute intensive DCT algorithm performs better on graphic processors than the best algorithm for x86. In particular, limited portability is true for cyclic multimedia workloads, a set of programs that run continuously with strict requirements for high performance and low latency. An example of a typical multimedia workload is a pipeline of many small image processing algorithms working in tandem to complete a particular task. The input can be videos from one or more live cameras, and the output is a set of video frames with elements from several of the source videos, for example as stitched panorama frames or 3D warped video. Such a setup runs continuously and potentially needs to adapt to various degrees of changes in the setup without interruptions or downtime. To reach the performance goal required by multimedia pipelines, modern, heterogeneous architectures are considered instead of the traditional symmetric multi-processing architectures. We also investigate variations between recent microarchitectures of symmetric processors to identify differences that a low-level scheduler must take into account. Further, since multimedia workloads often need to adapt to various external conditions, e.g., adding another participant to a video conference, we also investigate elastic and portable processing of multimedia work- loads. To do this, we propose a framework design and language, which we call P2G. In the age of Big Data, this idea differs from the typical frameworks used for distributed processing, such as MapReduce and Dryad, in that it is designed for continuous operation instead of batch process- ing of large workloads. We emphasise heterogeneous support and expose parallel opportunities in workloads in a way that is easy to target since it is similar to sequential execution with multidimensional arrays. The framework ideas are implemented as a prototype and released as an open source platform for further experimentation and evaluation.
Afilliation | Communication Systems, Communication Systems |
Publication Type | PhD Thesis |
Year of Publication | 2014 |
Publisher | University of Oslo |
Thesis Type | phd |
Proceedings, refereed
LEARS: a Lockless, Relaxed-Atomicity State Model for Parallel Execution of a Game Server Partition
In The 41st International Conference on Parallel Processing Workshops. IACC, 2012.Status: Published
LEARS: a Lockless, Relaxed-Atomicity State Model for Parallel Execution of a Game Server Partition
Supporting thousands of interacting players in a virtual world poses huge challenges with respect to processing. Existing work that addresses the challenge utilizes a variety of spatial partitioning algorithms to distribute the load. If, however, a large number of players needs to interact tightly across an area of the game world, spatial partitioning cannot subdivide this area without incurring massive communication costs, latency or inconsistency. It is a major challenge of game engines to scale such areas to the largest number of players possible; in a deviation from earlier thinking, parallelism on multi-core architectures is applied to increase scalability. In this paper, we evaluate the design and implementation of our game server architecture, called LEARS, which allows for lock-free parallel processing of a single spatial partition by considering every game cycle an atomic tick. Our prototype is evaluated using traces from live game sessions where we measure the server response time for all objects that need timely updates. We also measure how the response time for the multi-threaded implementation varies with the number of threads used. Our results show that the challenge of scaling up a game-server can be an embarrassingly parallel problem.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2012 |
Conference Name | The 41st International Conference on Parallel Processing Workshops |
Publisher | IACC |
ISBN Number | 978-0-7695-4795-4 |
Notes | Published as part of the SRMPDS workshop proceedings |
DOI | 10.1109/ICPPW.2012.55 |
Low-Level Scheduling Implications for Data-Intensive Cyclic Workloads on Modern Microarchitectures
In The 41st International Conference on Parallel Processing Workshops. IACC, 2012.Status: Published
Low-Level Scheduling Implications for Data-Intensive Cyclic Workloads on Modern Microarchitectures
Processing data intensive multimedia workloads is challenging, and scheduling and resource management are vitally important for the best possible utilization of machine resources. In earlier work, we have used work-stealing, which is frequently used today, and proposed improvements. We found already then that no singular work-stealing variant is ideally suited for all workloads. Therefore, we investigate in more detail in this paper how workloads consisting of various multimedia filter sequences should be scheduled on a variety of modern processor architectures to maximize performance. Our results show that a low-level scheduler additionally cannot achieve optimal performance without taking the specific micro-architecture, the placement of dependent tasks and cache sizes into account. These details are not generally available for application developers and they differ between deployments. Our proposal is therefore to use performance monitoring and dynamic adaption for the cyclic workloads of our target multimedia scenario, where operations are repeated cyclically on a stream of data.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2012 |
Conference Name | The 41st International Conference on Parallel Processing Workshops |
Publisher | IACC |
ISBN Number | 978-0-7695-4795-4 |
DOI | 10.1109/ICPPW.2012.49 |
Journal Article
Reducing Processing Demands for Multi-Rate Video Encoding: Implementation and Evaluation
International Journal of Multimedia Data Engineering and Management 3 (2012): 1-19.Status: Published
Reducing Processing Demands for Multi-Rate Video Encoding: Implementation and Evaluation
Segmented adaptive HTTP streaming has become the de facto standard for video delivery over the Internet for its ability to scale video quality to the available network resources. Here, each video is encoded in multiple qualities, i.e., running the expensive encoding process for each quality layer. However, these operations consume both a lot of time and resources, and in this paper, the authors propose a system for reusing redundant steps in a video encoder to improve the multi-layer encoding pipeline. The idea is to have multiple outputs for each of the target bitrates and qualities where the intermediate processing steps share and reuse the computational heavy analysis. A prototype has been implemented using the VP8 reference encoder, and their experimental results show that for both low- and high-resolution videos the proposed method can significantly reduce the processing demands and time when encoding the different quality layers.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Journal Article |
Year of Publication | 2012 |
Journal | International Journal of Multimedia Data Engineering and Management |
Volume | 3 |
Number | 2 |
Pagination | 1-19 |
DOI | 10.4018/jmdem.2012040101 |
Proceedings, refereed
A Demonstration of a Lockless, Relaxed Atomicity State Parallel Game Server (LEARS)
In Workshop on Network and Systems Support for Games (NetGames 2011). IEEE / ACM, 2011.Status: Published
A Demonstration of a Lockless, Relaxed Atomicity State Parallel Game Server (LEARS)
Games where thousands of players can interact concurrently pose many challenges with regards to the massive parallelism. Earlier work within the field suggests that this is difficult due to synchronization issues. In this paper, we present an implementation of a game server architecture based on a model that allows for massive parallelism. The system is evaluated using traces from live game sessions that has been scaled up to generate massive workloads. We measure the differences in server response time for all objects that need timely updates. We also measure how the response time for the multithreaded implementation varies with the number of threads used. Our results show that the case of implementing a game-server can actually be highly parallel problem.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2011 |
Conference Name | Workshop on Network and Systems Support for Games (NetGames 2011) |
Pagination | 1-3 |
Publisher | IEEE / ACM |
ISBN Number | 978-1-4577-1932-5 |
Improved Multi-Rate Video Encoding
In International Symposium on Multimedia. IEEE, 2011.Status: Published
Improved Multi-Rate Video Encoding
Adaptive HTTP streaming is frequently used for both live and on-Demand video delivery over the Internet. Adaptiveness is often achieved by encoding the video stream in multiple qualities (and thus bitrates), and then transparently switching between the qualities according to the bandwidth fluctuations and the amount of resources available for decoding the video content on the end device. For this kind of video delivery over the Internet, H.264 is currently the most used codec, but VP8 is an emerging open-source codec expected to compete with H.264 in the streaming scenario. The challenge is that, when encoding video for adaptive video streaming, both VP8 and H.264 run once for each quality layer, i.e., consuming both time and resources, especially important in a live video delivery scenario. In this paper, we address the resource consumption issues by proposing a method for reusing redundant steps in a video encoder, emitting multiple outputs with varying bitrates and qualities. It shares and reuses the computational heavy analysis step, notably macro-block mode decision, intra prediction and inter prediction between the instances, and outputs video in several rates. The method has been implemented in the VP8 reference encoder, and experimental results show that we can encode the different quality layers at the same rates and qualities compared to the VP8 reference encoder, while reducing the encoding time significantly.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2011 |
Conference Name | International Symposium on Multimedia |
Pagination | 293-300 |
Publisher | IEEE |
ISBN Number | 978-0-7695-4589-9 |
P2G: a Framework for Distributed Real-Time Processing of Multimedia Data
In Proceedings of the International Workshop on Scheduling and Resource Management for Parallel and Distributed Systems (SRMPDS) - The 2011 International Conference on Parallel Processing Workshops. IEEE, 2011.Status: Published
P2G: a Framework for Distributed Real-Time Processing of Multimedia Data
The computational demands of multimedia data processing are steadily increasing as consumers call for progressively more complex and intelligent multimedia services. New multi-core hardware architectures provide the required resources, but writing parallel, distributed applications remains a labor-intensive task compared to their sequential counter-part. For this reason, Google and Microsoft implemented their respective processing frameworks MapReduce, as they allow the developer to think sequentially, yet benefit from parallel and distributed execution. An inherent limitation in the design of these processing frameworks is their inability to express arbitrarily complex workloads. The dependency graphs of the frameworks are often limited to directed acyclic graphs, or even pre-determined stages. This is particularly problematic for video encoding and other algorithms that depend on iterative execution. With the Nornir runtime system for parallel programs, which is a Kahn Process Network implementation, we addressed and solved several of these limitations. However, it is more difficult to use than other frameworks due to its complex programming model. In this paper, we build on the knowledge gained from Nornir and present a new framework, called , designed specifically for developing and processing distributed real-time multimedia data. P2G supports arbitrarily complex dependency graphs with cycles, branches and deadlines, and provides both data- and task-parallelism. The framework is implemented to scale transparently with available (heterogeneous) resources, a concept familiar from the cloud computing paradigm. We have implemented an (interchangeable) P2G to ease development. In this paper, we present a proof of concept implementation of a P2G execution node and some experimental examples using complex workloads like Motion JPEG and K-means clustering. The results show that the P2G system is a feasible approach to multimedia processing.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2011 |
Conference Name | Proceedings of the International Workshop on Scheduling and Resource Management for Parallel and Distributed Systems (SRMPDS) - The 2011 International Conference on Parallel Processing Workshops |
Pagination | 416-426 |
Date Published | September |
Publisher | IEEE |
ISBN Number | 978-0-7695-4511-0 |
Processing of Multimedia Data Using the P2G Framework
In Proceedings of the 19th ACM international conference on Multimedia. ACM, 2011.Status: Published
Processing of Multimedia Data Using the P2G Framework
In this demo, we present the P2G framework designed for processing distributed real-time multimedia data. P2G supports arbitrarily complex dependency graphs with cycles, branches and deadlines. P2G is implemented to scale transparently with available resources, i.e., a concept familiar from the cloud computing paradigm. Additionally, P2G supports heterogeneous computing resources, such as x86 and GPU processing cores. We have implemented an interchangeable P2G kernel language which is meant to expose fundamental concepts of the P2G programming model and ease the application development. Here, we demonstrate the P2G execution node using a MJPEG encoder as an example workload when dynamically adding and removing processing cores.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2011 |
Conference Name | Proceedings of the 19th ACM international conference on Multimedia |
Pagination | 819-820 |
Publisher | ACM |
ISBN Number | 978-1-4503-0616-4 |
Poster
Distributed Real-Time Processing of Multimedia Data With the P2G Framework
2011.Status: Published
Distributed Real-Time Processing of Multimedia Data With the P2G Framework
P2G is a framework designed to integrate concepts from modern batch processing frameworks into the world of real-time multimedia processing, where we seek to scale transparently with the available resources. P2G consists of a compiler and run-time that analyzes dependencies dynamically and merges or splits kernel instances based on resouce availability and performance monitoring.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Poster |
Year of Publication | 2011 |
Date Published | April |
Talks, contributed
Temming Av Multikjerneprosessorer - Fordeler Og Utfordringer
In The Gathering World & Pegasus, 2010.Status: Published
Temming Av Multikjerneprosessorer - Fordeler Og Utfordringer
De fleste datamaskiner har i dag en multikjerneprosessor. Kanskje du også har et grafikkort i datamaskinen din, eller kanskje du har en Playstation 3? Da har du også en asymmetrisk multikjerneprosessor - som stort sett bare utnyttes når du spiller spill!
Afilliation | Communication Systems, Communication Systems |
Publication Type | Talks, contributed |
Year of Publication | 2010 |
Location of Talk | The Gathering World & Pegasus |
Journal Article
The Nornir Run-Time System for Parallel Programs Using Kahn Process Networks on Multi-Core Machines - a Flexible Alternative to MapReduce
The Journal of Supercomputing (2010): 1-27.Status: Published
The Nornir Run-Time System for Parallel Programs Using Kahn Process Networks on Multi-Core Machines - a Flexible Alternative to MapReduce
Even though shared-memory concurrency is a paradigm frequently used for developing parallel applications on small- and middle-sized machines, experience has shown that it is hard to use. This is largely caused by synchro- nization primitives which are low-level, inherently nondeterministic, and, consequently, non-intuitive to use. In this paper, we present the Nornir run-time system. Nornir is comparable to well-known frameworks such as MapReduce and Dryad that are recognized for their efficiency and simplicity. Unlike these frameworks, Nornir also supports process structures containing branches and cycles. Nornir is based on the formalism of Kahn process networks, which is a shared-nothing, message-passing model of concurrency. We deem this model a simple and deterministic alternative to shared-memory concurrency. Experiments with real and synthetic benchmarks on up to 8 CPUs show that per- formance in most cases scales almost linearly with the number of CPUs, when not limited by data dependencies. We also show that the modeling flexibility allows Nornir to outperform its MapReduce counter-parts using well-known benchmarks.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Journal Article |
Year of Publication | 2010 |
Journal | The Journal of Supercomputing |
Number | Online first |
Pagination | 1-27 |
Notes | In print: January 2013, Volume 63, Issue 1, pp 191-217 |
DOI | 10.1007/s11227-010-0503-2 |
Proceedings, refereed
Tips, Tricks and Troubles: Optimizing for Cell and GPU
In The 20th International Workshop on Network and Operating Systems Support for Digital Audio and Video (NOSSDAV 2010). ACM, 2010.Status: Published
Tips, Tricks and Troubles: Optimizing for Cell and GPU
When used efficiently, modern multicore architectures, such as Cell and GPUs, provide the processing power required by resource demanding multimedia workloads. However, the diversity of resources exposed to the programmers, intrinsically requires specific mindsets for efficiently utilizing these resources - not only compared to an x86 architecture, but also between the Cell and the GPUs. In this context, our analysis of 14 different Motion-JPEG implementations indicates that there exists a large potential for optimizing performance, but there are also many pitfalls to avoid. By experimentally evaluating algorithmic choices, inter-core data communication (memory transfers) and architecture-specific capabilities, such as instruction sets, we present tips, tricks and troubles with respect to efficient utilization of the available resources.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2010 |
Conference Name | The 20th International Workshop on Network and Operating Systems Support for Digital Audio and Video (NOSSDAV 2010) |
Pagination | 75-80 |
Date Published | June |
Publisher | ACM |
ISBN Number | 978-1-4503-0043-8 |
Proceedings, refereed
Improving Disk I/O Performance on Linux
In UpTimes - Proceedings of Linux-Kongress and OpenSolaris Developer Conference 2009. German Unix User Group, 2009.Status: Published
Improving Disk I/O Performance on Linux
The existing Linux disk schedulers are in general efficient, but we have identified two scenarios where we have observed a non-optimal behavior. The first is when an application requires a fixed bandwidth, and the second is when an operation performs a file tree traversal. In this paper, we address both these scenarios and propose solutions which both increase performance.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2009 |
Conference Name | UpTimes - Proceedings of Linux-Kongress and OpenSolaris Developer Conference 2009 |
Pagination | 61-70 |
Date Published | October |
Publisher | German Unix User Group |
ISBN Number | 978-3-86541-358-1 |
Improving File Tree Traversal Performance by Scheduling I/O Operations in User Space
In Proceedings of the 28th IEEE International Performance Computing and Communications Conference (IPCCC). IEEE, 2009.Status: Published
Improving File Tree Traversal Performance by Scheduling I/O Operations in User Space
Current in-kernel disk schedulers provide efficient means to optimize the order (and minimize disk seeks) of issued, in-queue I/O requests. However, they fail to optimize sequential multi-file operations, like traversing a large file tree, because only requests from one file are available in the scheduling queue at a time. We have therefore investigated a user-level, I/O request sorting approach to reduce inter-file disk arm movements. This is achieved by allowing applications to utilize the placement of inodes and disk blocks to make a one sweep schedule for all file I/Os requested by a process, i.e., data placement information is read first before issuing the low-level I/O requests to the storage system. Our experiments with a modified version of tar show reduced disk arm movements and large performance improvements.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2009 |
Conference Name | Proceedings of the 28th IEEE International Performance Computing and Communications Conference (IPCCC) |
Pagination | 145-152 |
Publisher | IEEE |
ISBN Number | 978-1-4244-5736-6 |
Limits of Work-Stealing Scheduling
In Job Scheduling Strategies for Parallel Processing (14th International Workshop, 2009). Springer Berlin / Heidelberg, 2009.Status: Published
Limits of Work-Stealing Scheduling
The number of applications with many parallel cooperating processes is steadily increasing, and developing efficient runtimes for their execution is an important task. Several frameworks have been developed, such as MapReduce and Dryad, but developing scheduling mechanisms that take into account processing \emph{and} communication requirements is hard. In this paper, we explore the limits of work stealing scheduler, which has empirically been shown to perform well, and evaluate load-balancing based on graph partitioning as an orthogonal approach. All the algorithms are implemented in our Nornir runtime system, and our experiments on a multi-core workstation machine show that the main cause of performance degradation of work stealing is when very little processing time, which we quantify exactly, is performed per message. This is the type of workload in which graph partitioning has the potential to achieve better performance than work-stealing.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2009 |
Conference Name | Job Scheduling Strategies for Parallel Processing (14th International Workshop, 2009) |
Pagination | 280-299 |
Publisher | Springer Berlin / Heidelberg |
ISBN Number | 978-3-642-04632-2 |
DOI | 10.1007/978-3-642-04633-9\_15 |
Proceedings, refereed
Transparent Protocol Translation and Load Balancing on a Network Processor in a Media Streaming Scenario
In Network and Operating System Support for Digital Audio and Video (NOSSDAV 2008). ACM, 2008.Status: Published
Transparent Protocol Translation and Load Balancing on a Network Processor in a Media Streaming Scenario
Today, major newspapers and TV stations make live and on-demand audio/video content available, video-on-demand services are becoming common and even personal media are frequently uploaded to streaming sites. The discussion about the best transport protocol for streaming has been going on for years. Currently, HTTP-streaming is usual although the transport of streaming media data over TCP is hindered by TCP's probing behavior, which results in the rapid reduction and slow recovery of the packet rates. On the other hand, UDP has been criticized for being unfair against TCP, and it is therefore often blocked by access network providers. To exploit benefits of both TCP and UDP, we have implemented a proxy that performs transparent protocol translation in such a way that the video stream is delivered to clients in a TCP-compatible and TCP-friendly way, but with UDP-like smoothness. The translation is related to multicast-to-unicast translation and to voice-over-IP proxies that translate between UDP and TCP. Furthermore, it is also similar to the use of proxy caching that ISPs employ to reduce bandwidth demands. The unique advantage of our approach is that we avoid full-featured TCP handling on the proxy server but still achieve live protocol translation at line-speed in a TCP-compliant, TCP-friendly manner. Although we discard packets just like a sender of non-adaptive video over TCP, we achieve higher user-perceived quality because our proxy can avoid receive queue underflows in the proxy, while also achieving the same average bandwidth as a TCP connection between proxy and client. In this demo, we present our prototype implemented on an Intel IXP2400 network processor. The prototype proxy does not buffer outgoing packets, yielding data loss in case of a congested TCP side. Comparing HTTP-streaming from a web-server and RTP/UDP-streaming from a video server shows that, in case of some loss, our solution using UDP from the server and a proxy that translates to TCP delivers a smoother stream at playout rate while the end-to-end TCP stream oscillates heavily.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2008 |
Conference Name | Network and Operating System Support for Digital Audio and Video (NOSSDAV 2008) |
Pagination | 129-130 |
Date Published | May |
Publisher | ACM |
ISBN Number | 978-1-60588-157-6 |
Proceedings, refereed
Transparent Protocol Translation for Streaming
In ACM International Multimedia Conference (ACM MM). ACM, 2007.Status: Published
Transparent Protocol Translation for Streaming
The transport of streaming media data over TCP is hindered by TCP's probing behavior that results in the rapid reduction and slow recovery of the packet rates. On the other side, UDP has been criticized for being unfair against TCP connections, and it is therefore often blocked out in the access networks. In this paper, we try to benefit from a combined approach using a proxy that transparently performs transport protocol translation. We translate HTTP requests by the client transparently into RTSP requests, and translate the corresponding RTP/UDP/AVP stream into the corresponding HTTP response. This enables the server to use UDP on the server side and TCP on the client side. This is beneficial for the server side that scales to a higher load when it doesn't have to deal with TCP. On the client side, streaming over TCP has the advantage that connections can be established from the client side, and data streams are passed through firewalls. Preliminary tests demonstrate that our protocol translation delivers a smoother stream compared to HTTP-streaming where the TCP bandwidth oscillates heavily.
Afilliation | Communication Systems, Communication Systems |
Publication Type | Proceedings, refereed |
Year of Publication | 2007 |
Conference Name | ACM International Multimedia Conference (ACM MM) |
Pagination | 771-774 |
Date Published | September |
Publisher | ACM |
ISBN Number | 978-1-59593-702-5 |
Notes | (short paper) © ACM, (2007). This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in Proceedings of the 15th international conference on Multimedia (2007), http://doi.acm.org/10.1145/1291233.1291407 |