Skip to main content
Fig. 3 | Genome Medicine

Fig. 3

From: New var reconstruction algorithm exposes high var sequence diversity in a single geographic location in Mali

Fig. 3

Overview of the ETHA algorithm for reconstructing exon 1 sequences: Illumina and PacBio sequencing data are both used in conjunction with previously characterized exon 1 sequences from VarDom [13] as data inputs for reconstructing exon 1 sequences in clinical whole-genome assemblies. Pacbio data are assembled and exon 1 ends are identified by mapping known exon 1 sequences from VarDom onto the assembly (steps 1 and 2). Illumina data corresponding to var genes are identified by finding 71 bp segments (71mers) containing var splice site sequences at the end of exon 1 in the assembly and iteratively following possible continuations (new trusted 71mers overlapping previously identified var 71mers by 70 bp) within the Illumina data (steps 3 and 4). This process is extended until a start methionine is reached (step 4). This k-mer walk is repeated in the opposite direction, now from the start methionine to the intron. They are then assembled by generating all possible paths within the de Bruijn graph of 71mers (step 5) and reconciled with the whole-genome assembly by choosing those paths which align best with the whole-genome assembly (steps 6–8). Data inputs in white; processes are in gray. See “Methods” for additional details

Back to article page