Skip to main content
Figure 2 | Genome Medicine

Figure 2

From: Estimating exome genotyping accuracy by comparing to data from large scale sequencing projects

Figure 2

Analysis of low quality samples. (A) Comparison of variant allele frequencies in different sample sets. Value pairs of genotype frequencies were computed for exomes of the reference set (CEUs from the 1000 genomes project) and compared to test samples of the same ethnicity that are low quality. The ellipse indicates twice the standard deviation assuming a binomial model for the allele frequency p. Variants in the right lower quadrant were called with a lower probability in our test samples and are characterized by a GC content that deviates from the expected mean. (B) GC content at false negative positions. Variants that are underrepresented in exomes with a large distance to the high quality reference set are overrepresented in exome regions with high GC content (violet curve). The green curve indicates the distribution of the GC-content that is expected for an equal number of variants that are randomly drawn from the exome. (C) Coverage against GC content. The mean sequence coverage of the consensus exome varies with the GC content of the target region. The overall coverage for an exemplary sample from the 1000 genomes project (NA06986) was higher compared to test samples 1 and 2. Test sample 1 has a particularly low coverage in regions with extreme GC content, suggesting a higher error rate.

Back to article page