Objective | Data | Method | Description |
---|---|---|---|
Recurrent somatic mutation identification | SNV | MutSigCV[48] | Uses coverage information and genomic features (e.g. DNA replication time) to estimate the background mutation rate of a gene. |
MuSiC[49] | Uses a per-gene background mutation rate; allows for user-defined regions of interest. | ||
Youn et al.[51] | Includes predicted impact on protein function in determining recurrent mutations. | ||
Sjöblom et al.[52] | Defines a cancer mutation prevalence score for each gene. | ||
DrGaP[139] | Uses Bayesian approach to estimate background mutation rate; helpful for cancer types with low mutation rate. | ||
CNA | Uses ‘peel-off’ techniques to find smaller recurrent aberrations inside larger aberrations. | ||
CMDS[62] | Identifies recurrent CNAs from unsegmented data. | ||
ADMIRE[65] | Multi-scale smoothing of copy number profiles. | ||
Functional impact prediction | General | SIFT[72] | Uses conservation of amino acids to predict functional impact of a non-synonymous amino-acid change. |
Polyphen-2[74] | Infers functional impact of non-synonymous amino-acid changes through alignments of related peptide sequences and a machine-learning-based probabilistic classifier. | ||
MutationAssessor[75] | Uses protein homologs to calculate a score based on the divergence in conservation caused by an amino-acid change. | ||
PROVEAN[73] | Benchmarks favorably against MutationAssessor, Polyphen-2 and SIFT. | ||
Cancer-specific | CHASM[77] | Uses a machine-learning approach to classify mutations as drivers or passengers based on sequence conservation, protein domains, and protein structure. | |
Oncodrive-FM[79] | Combines scores from SIFT, Polyphen-2, and MutationAccessor into a single ranking. | ||
Positional or structural clustering | NMC[83] | Finds clusters of non-synonymous mutations across patients. Typically used with missense mutations to detect so-called ‘activating’ mutations. | |
iPAC[84] | Extends the NMC approach to search for clusters of mutations in three-dimensional space using crystal structures of proteins. | ||
Pathway analysis and combinations of mutations | Known pathways | GSEA[92] | A general technique for testing ranked lists of genes for enrichment in known gene sets. Can be used on rankings derived from significance of observed mutations. |
PathScan[95] | Finds pathways with excess of mutations in a gene set (pathway), by combining P-values of enrichment across samples. | ||
Patient-oriented gene sets[94] | Tests known pathways using a binary indicator for a pathway in each patient. | ||
Interaction networks | NetBox[140] | Finds network modules in a user-provided list of genes. Significance depends only on the topology of the genes in the network, and not on mutation scores. | |
HotNet[102] | Finds subnetworks with significantly more aberrations than would be expected by chance, using both network topology and user-defined gene or protein scores. | ||
MEMo[104] | Finds subnetworks whose interacting pairs of genes have mutually exclusive aberrations[105]; recommends including only recurrent SNVs and CNAs in the analysis. | ||
De novo | Dendrix[102] | Identifies groups of genes with mutually exclusive aberrations. | |
Multi-Dendrix[112] | Simultaneously finds multiple groups of genes with mutually exclusive aberrations. | ||
 |  | RME[110] | Finds groups of genes with mutually exclusive aberrations by building from gene pairs; best results obtained when restricting to genes with high mutation frequencies (e.g. > 10%). |