Skip to main content
Fig. 1 | Genome Medicine

Fig. 1

From: Evaluating the transcriptional fidelity of cancer models

Fig. 1

CancerCellNet (CCN) workflow, training, and performance. a Schematic of CCN usage. CCN was designed to assess and compare the expression profiles of cancer models such as CCLs, PDXs, GEMMs, and tumoroids with native patient tumors. To use trained classifier, CCN inputs the query samples (e.g. expression profiles from CCLs, PDXs, GEMMs, tumoroids) and generates a classification profile for the query samples. The column names of the classification heatmap represent sample annotation and the row names of the classification heatmap represent different cancer types. Each grid is colored from black to yellow representing the lowest classification score (e.g. 0) to highest classification score (e.g. 1). b Schematic of CCN training process. CCN uses patient tumor expression profiles of 22 different cancer types from TCGA as training data. First, CCN identifies n genes that are upregulated, n that are downregulated, and n that are relatively invariant in each tumor type versus all of the others. Then, CCN performs a pair transform on these genes and subsequently selects the most discriminative set of m gene pairs for each cancer type as features (or predictors) for the random forest classifier. Lastly, CCN trains a multi-class random forest classifier using gene-pair transformed training data. c Parameter optimization strategy. Five cross-validations of each parameter set in which 2/3 of TCGA data was used to train and 1/3 to validate was used to search for the values of n and m that maximized performance of the classifier as measured by area under the precision recall curve (AUPRC). d Mean and standard deviation of classifiers based on 50 cross-validations with the optimal parameter set. e AUPRCs of the final CCN classifier when applied to independent patient tumor data from ICGC

Back to article page