Skip to main content

Table 1 Methods we compared with the INSPIRE framework; To our knowledge, there are no published methods for learning modules and their dependencies that can handle variable discrepancy. We adapted the following five state-of-the-art methods that can run on a single dataset: GLasso - standard graphical lasso [54], UGL - unknown group L 1 regularization [62], SLFA - the structured latent factor analysis [22], WGCNA - weighted gene co-expression network analysis [8], and MGL - module graphical lasso [11] (see “Methods” for details). We adapted the input datasets such that we can apply these methods to datasets with variable discrepancy (Additional file 2: Figure S1B): “---1”, learning a model from only Dataset1 that contains all genes; “Inter---”, learning a model from the data on the overlapping genes (blue-shaded region in Fig. 1) and assigning the rest of the genes to learned modules by using the k-nearest neighbor approach (i.e. based on the Euclidean distance between the gene’s expression and the expression of each of the modules); and “Imp---”, imputing missing values in Dataset2 and learning a model from the imputed data (see “Methods” for details on imputation) (Additional file 2: Figure S1B). These adaptations lead to 13 competitors: (1) GLasso1; (2) ImpGLasso; (3) UGL1; (4) ImpUGL; (5) WGCNA1; (6) InterWGCNA; (7) ImpWGCNA; (8) SLFA1; (9) InterSLFA; (10) ImpSLFA; (11) MGL1; (12) InterMGL; and (13) ImpMGL. In the experiments on synthetic data, we compared to all 13 methods, while in the experiments with two genome-wide ovarian cancer gene expression datasets which we will discuss in the subsequent sections, we only used the methods that are scalable (see Additional file 3: Figure S2) These methods are indicated by the purple-shaded region in the table. The “Inter---” method is not applicable to GLasso and UGL, because GLasso and UGL learn a network of genes, not modules, and it is not obvious how to connect the genes that are present only in Dataset1 to the learned network. We do not consider an adaptation that applies the methods to Dataset2 only (“---2”). This is because, other than the genes in the overlap, Dataset2 has no genes (in the synthetic data experiments) or a very small number of genes (in the experiments with genome-wide expression data), which makes “---2” that uses only the samples from Dataset2 unlikely to outperform “Inter---” that uses all samples

From: Extracting a low-dimensional description of multiple gene expression datasets reveals a potential driver for tumor-associated stroma in ovarian cancer

Method Description Different ways to deal with missing data Scalability (see Additional file 3: Figure S2)
---1 Inter--- Imp---
GLasso Standard graphical lasso [54] GLasso1 X ImpGLasso No
UGL Unknown group L 1 regularization [62] UGL1 X ImpUGL No
SLFA Structured latent factor analysis [22] SLFA1 InterSLFA ImpSLFA No
WGCNA Weighted gene co-expression network analysis [8] WGCNA1 InterWGCNA ImpWGCNA Yes
MGL Module graphical lasso [11] MGL1 InterMGL ImpMGL Yes