Query OSV API for individual package vulnerabilities

Will connect to OSV API and query vulnerabilities from the specified packages. Unlike the other query functions, osv_query will only return content and not the response object. By default all vulnerabilities are returned for any versions of the package flagged in OSV. This can be subset manually or via the parameter all_affected.

Usage

osv_query(
  name = NULL,
  version = NULL,
  ecosystem = NULL,
  all_affected = TRUE,
  cache = TRUE,
  ...
)

Arguments

name: Character vector of package names.
version: Character vector of package versions, NA if ignoring versions.
ecosystem: Character vector of ecosystem(s) within which the package(s) exist.
all_affected: Boolean value, if TRUE return all package results found per vulnerability discovered.
cache: Boolean value to determine if should use a cached version of the function and API results.
...: Any other parameters to pass to nested functions.

Value

A data.frame with query results parsed.

Details

Since the query and batchquery API endpoints have different outputs, this function will align their contents to be a list of vulnerabilities. For 'query' this meant flattening the returned list once; for 'batchquery' the returned IDs are used to fetch additional vulnerability information and then flattened to a list.

If only an ecosystem parameter is provided, all vulnerabilities for that selection will be downloaded from the OSV database and parsed into a tidied table. Since some vulnerabilities can exist across ecosystems, all_affected may need to be set to FALSE.

Since the OSV database is organized by vulnerability, the returned content may have duplicate package details as the same package, and possibly its version, may occur within several different reported vulnerabilities. To avoid this behaviour, set the all_affected parameter to FALSE.

Due to variations in formatting from the OSV API, not all responses have versions associated in the response but instead use ranges. Filtering currently does not apply to this field and may return all versions affected within the ranges. If you suspect ranges are used instead of specific version codes, examine the response object using lower-level functions like osv_query_1().

To speed up the process for large ecosystems you can set future::plan() for parallelization; this will be respected via the furrr package. The default will be to run sequentially. There are performance impacts to allow for mixed ecosystems to be queried. For packages with many vulnerabilities, it can be faster to perform those separately so all vulnerabilities can be pulled at once and not individually. Alternative approaches may be implemented in future versions.

Examples

if (FALSE) { # interactive()

# Single package
pkg_vul <- osv_query('dask', ecosystem = 'PyPI')

# Batch query
name_vec <- c('dask', 'dash')
ecosystem_vec <- rep('PyPI', length(name_vec))
pkg_vul <- osv_query(name_vec, ecosystem = ecosystem_vec)
}