Skip to main content

Table 2 Composition of the datasets used as proxies to compare the performance of transformed and original scores at assessing the functional impact of cancer somatic mutations

From: Improving the prediction of the functional impact of cancer mutations by baseline tolerance transformation

Name

Source

Positives

Negatives

N positives

N negatives

COSMIC2+/1

COSMIC

Mutations that appear in 2 or more samples

Mutations that appear in 1 sample

4,012

39,854

COSMIC5+/1

COSMIC

Mutations that appear in 5 or more samples

Mutations that appear in 1 sample

1,480

39,854

COSMIC2+/Pol

COSMIC/ HumVar [2]

Mutations that appear in 2 or more samples

Known polymorphisms

4,012

8,257

COSMIC5+/Pol

COSMIC/ HumVar

Mutations that appear in 5 or more samples

Known polymorphisms

1,480

8,257

COSMICD/O

COSMIC

COSMIC mutations included in the manually curated list of drivers used to train CHASM [5]

COSMIC mutations without the positive subset

2,185

41,681

COSMICD/Pol

COSMIC/ HumVar

Mutations included in the manually curated list of drivers used to train CHASM

Known polymorphisms

2,185

8,257

COSMICCGC/ nonCGC

COSMIC

COSMIC mutations in genes included in the Cancer Gene Census [13]

Non-recurrent COSMIC mutations in genes not included in the Cancer Gene Census

4,685

35,907

WG2+/1

Pooled cancer somatic mutations

Mutations that appear in 2 or more samples

Mutations that appear in 1 sample

1,031

26,025

WGCGC/ nonCGC

Pooled cancer somatic mutations

Mutations in genes included in the Cancer Gene Census [13]

Non-recurrent mutations in genes not included in the Cancer Gene Census

1,412

24,837

  1. HumVar is a dataset of disease-related SNVs and neutral polymorphisms [2]. WG (whole genome) is a dataset of somatic mutations pooled from different tumor exome-sequencing projects (Table 1).