Skip to main content

Table 1 Same software, different transcripts: REFSEQ vs ENSEMBL by ANNOVAR annotation category

From: Choice of transcripts and software has a large effect on variant annotation

 

REF+ENS

REF

ENS

Match

REF match

ENS match

Overall match

     

rate (%)

rate (%)

rate (%)

stopgain_SNV

15,835

14,183

14,960

13,308

93.83

88.96

84.04

frameshift_insertion

6,980

5,298

6,495

4,813

90.85

74.10

68.95

frameshift_deletion

7,491

4,547

7,380

4,436

97.56

60.11

59.22

stoploss_SNV

946

503

906

463

92.05

51.10

48.94

splicing

47,878

14,154

45,839

12,115

85.59

26.43

25.30

frameshift_substitution

1,960

195

1,947

182

93.33

9.35

9.29

nonsynonymous_SNV

321,669

291,898

315,592

285,821

97.92

90.57

88.86

nonframeshift_insertion

3,506

2,888

2,844

2,226

77.08

78.27

63.49

nonframeshift_deletion

5,136

3,321

4,963

3,148

94.79

63.43

61.29

nonframeshift_substitution

933

226

843

136

60.18

16.13

14.58

synonymous_SNV

178,559

167,561

172,463

161,465

96.36

93.62

90.43

UTR3

724,802

574,255

622,441

471,894

82.17

75.81

65.11

UTR5

177,832

94,545

162,684

79,397

83.98

48.80

44.65

UTR5_UTR3

2,183

292

2,092

201

68.84

9.61

9.21

ncRNA_intronic

8,992,009

2,113,428

8,244,441

1,365,860

64.63

16.57

15.19

ncRNA_exonic

654,098

140,303

597,947

84,152

59.98

14.07

12.87

ncRNA_UTR3

53,379

10,712

47,133

4,466

41.69

9.48

8.37

ncRNA_UTR5

10,683

1,989

9,444

750

37.71

7.94

7.02

ncRNA_splicing

13,931

1,051

13,562

682

64.89

5.03

4.90

ncRNA_UTR5_ncRNA_UTR3

107

1

106

0

0.00

0.00

0.00

intronic

29,289,037

26,805,864

27,743,749

25,260,576

94.24

91.05

86.25

intergenic

50,305,202

49,797,113

41,307,708

40,799,619

81.93

98.77

81.10

downstream

991,811

474,684

840,376

323,249

68.10

38.46

32.59

upstream

910,818

440,728

762,664

292,574

66.38

38.36

32.12

upstream_downstream

53,608

15,621

47,293

9,306

59.57

19.68

17.36

unknown

11,205

6,215

5,703

713

11.47

12.50

6.36

ALL LOF

81,090

38,880

77,527

35,317

90.84

45.55

43.55

ALL LOF and MISSENSE

412,334

337,213

401,769

326,648

96.87

81.30

79.22

ALL EXONIC

590,893

504,774

574,232

488,113

96.70

85.00

82.61

ALL

80,981,575

80,981,575

80,981,575

69,181,552

85.43

85.43

85.43

  1. This table summarises the number of annotations that match between the REFSEQ and ENSEMBL results for each category of annotation. It shows the number of variants given each type of annotation when using (i) either REFSEQ or ENSEMBL (‘REF+ENS’; union), (ii) REFSEQ (‘REF’) and (iii) ENSEMBL (‘ENS’). It also shows the number of variants that have matching annotations (i.e. the same annotation when using both transcript sets; intersection) and the match rate for each transcript set, which expresses the proportion of matching annotations for an annotation term relative to the total number of annotations in the category from the particular transcript set, as a percentage. The final column shows the ‘Overall match rate’, which is the percentage of the variants with a given annotation when using either REFSEQ or ENSEMBL (‘REF+ENS’) that have a matching annotation when using the two transcript sets. Categories are loosely ordered by the severity of effect, with LoF annotations listed before nonsynonymous, synonymous, non-exonic categories and so on. Within each loose group, categories are sorted in descending order of overall matching rate. The bottom four rows show the total degree of matching across all putative loss-of-function (LoF) categories, all LoF and missense categories, all exonic categories and, finally, all categories.