AuthorsG. P. Bhandari, A. Naseer, and L. Moonen
EditorsX. Xia, and S. Amasaki
TitleCVEfixes: Automated Collection of Vulnerabilities and Their Fixes from Open-Source Software
AfilliationSoftware Engineering
Project(s)Data-Driven Software Engineering Department
Publication TypeProceedings, refereed
Year of Publication2021
Conference Name17th International Conference on Predictive Models and Data Analytics in Software Engineering (PROMISE 2021)
Date Published08/2021
ISBN Number978-1-4503-8680-7/21/08
Keywordsdataset, Security vulnerabilities, software repository mining, source code repair, vulnerability classification, vulnerability prediction

Data-driven research on the automated discovery and repair of
security vulnerabilities in source code requires comprehensive
datasets of real-life vulnerable code and their fixes. To assist
in such research, we propose a method to automatically collect and
curate a comprehensive vulnerability dataset from Common Vulnerabilities
and Exposures (CVE) records in the public National Vulnerability
Database (NVD). We implement our approach in a fully automated
dataset collection tool and share an initial release of the resulting
vulnerability dataset named CVEfixes.

The CVEfixes collection tool automatically fetches all available
CVE records from the NVD, gathers the vulnerable code and corresponding
fixes from associated open-source repositories, and organizes the
collected information in a relational database. Moreover, the
dataset is enriched with meta-data such as programming language,
and detailed code and security metrics at five levels of abstraction.
The collection can easily be repeated to keep up-to-date with newly
discovered or patched vulnerabilities. The initial release of
CVEfixes spans all published CVEs up to 9 June 2021, covering 5365
CVE records for 1754 open-source projects that were addressed in a
total of 5495 vulnerability fixing commits.

CVEfixes supports various types of data-driven software security
research, such as vulnerability prediction, vulnerability classification,
vulnerability severity prediction, analysis of vulnerability-related
code changes, and automated vulnerability repair.

Citation Keybhandari2021:cvefixes

Contact person