snpEff inappropriate interpretation of variants?
1
1
Entering edit mode
7.3 years ago
moxu ▴ 510

I have RNA-seq datasets, and I applied GATK variant calling procedure for RNA-seq (https://software.broadinstitute.org/gatk/documentation/article.php?id=3891). I have not found anything wrong with it yet. However, when I took the .vcf file output from GATK and used snpEff with GRCh 37.75 to assess the impact of variants, it seems either the genomic coordinates were not handled correctly or the variant impact assessment mechanism was off. For instance, one variant output from GATK was documented like the following:

chr19   7975586 .   C   T   275.78  .   AC=2;AF=1.00;AN=2;DP=9;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=30.64;SOR=4.615 GT:AD:DP:GQ:PL  1/1:0,9:9:26:304,26,0

This denotes the variant as T/T at chr19:7975586, which corresponds to codon GAT (D or Asp) in an exon of MAP2K7. The reference allele is C/C, which forms the codon GAC coding for Asp as well. In other words, this variant does not change the underlying amino acid, it's a synonymous variant, and the impact of this variant should be very low if any?

However, snpEff generated the following results:

chr19   7975586 .   C   T   275.78  .   AC=2;AF=1.00;AN=2;DP=9;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=30.64;SOR=4.615;ANN=T|structural_interaction_variant|HIGH|MAP2K7|ENSG00000076984|interaction|2DYL:A_166-A_207:ENST00000397983|protein_coding|7/12|c.621C>T||||||,T|synonymous_variant|LOW|MAP2K7|ENSG00000076984|transcript|ENST00000545011|protein_coding|6/11|c.699C>T|p.Asp233Asp|764/3497|699/1386|233/461||,T|synonymous_variant|LOW|MAP2K7|ENSG00000076984|transcript|ENST00000397981|protein_coding|6/11|c.573C>T|p.Asp191Asp|675/3430|573/1281|191/426||,T|synonymous_variant|LOW|MAP2K7|ENSG00000076984|transcript|ENST00000397983|protein_coding|7/12|c.621C>T|p.Asp207Asp|681/3396|621/1308|207/435||,T|synonymous_variant|LOW|MAP2K7|ENSG00000076984|transcript|ENST00000397979|protein_coding|6/11|c.573C>T|p.Asp191Asp|627/3361|573/1260|191/419||,T|upstream_gene_variant|MODIFIER|MAP2K7|ENSG00000076984|transcript|ENST00000498118|retained_intron||n.-398C>T|||||398|,T|downstream_gene_variant|MODIFIER|CTD-3193O13.13|ENSG00000268149|transcript|ENST00000595655|antisense||n.*1947G>A|||||1947|,T|downstream_gene_variant|MODIFIER|MAP2K7|ENSG00000076984|transcript|ENST00000475022|retained_intron||n.*3635C>T|||||3635|,T|non_coding_transcript_exon_variant|MODIFIER|MAP2K7|ENSG00000076984|transcript|ENST00000468058|retained_intron|5/10|n.768C>T||||||,T|non_coding_transcript_exon_variant|MODIFIER|MAP2K7|ENSG00000076984|transcript|ENST00000494348|retained_intron|2/3|n.496C>T||||||,T|non_coding_transcript_exon_variant|MODIFIER|MAP2K7|ENSG00000076984|transcript|ENST00000465324|retained_intron|2/2|n.365C>T||||||   GT:AD:DP:GQ:PL  1/1:0,9:9:26:304,26,0

snpEff annotated this variant as structural_interaction_variant with HIGH impact. How and why? As said above, shouldn't this simply be a meaningless variant? Is GRCh 37.75 the right match for hg19?

Another example is like the following:

GATK output --

chr16   3070708 .   TG  T   173.77  .   AC=2;AF=1.00;AN=2;DP=10;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=24.82;SOR=2.584    GT:AD:DP:GQ:PL  1/1:0,7:7:21:211,21,0

snpEff output --

 chr16  3070708 .   TG  T   173.77  .   AC=2;AF=1.00;AN=2;DP=10;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=24.82;SOR=2.584;ANN=T|frameshift_variant|HIGH|TNFRSF12A|ENSG00000006327|transcript|ENST00000575124|protein_coding|1/3|c.318delG|p.Ser107fs|341/1133|318/681|106/226||INFO_REALIGN_3_PRIME,T|upstream_gene_variant|MODIFIER|CLDN6|ENSG00000184697|transcript|ENST00000328796|protein_coding||c.-4687delC|||||2521|,T|upstream_gene_variant|MODIFIER|THOC6|ENSG00000131652|transcript|ENST00000326266|protein_coding||c.-3619delG|||||3323|,T|upstream_gene_variant|MODIFIER| ... (omitted)

The question is: this is a variant (indel) located either in an intronic region or in the 5' UTR of a TNFRSF12A transcript, how can this be interpreted as a frameshift variant?

Thanks for any suggestions!

RNA-Seq software error alignment variant calling • 5.3k views
ADD COMMENT
1
Entering edit mode

the 1st prediction, formatted:

T  structural_interaction_variant      HIGH      MAP2K7          ENSG00000076984  interaction  2DYL:A_166-A_207:ENST00000397983  protein_coding   7/12        c.621C>T
T  synonymous_variant                  LOW       MAP2K7          ENSG00000076984  transcript   ENST00000545011                   protein_coding   6/11        c.699C>T  p.Asp233Asp  764/3497  699/1386  233/461
T  synonymous_variant                  LOW       MAP2K7          ENSG00000076984  transcript   ENST00000397981                   protein_coding   6/11        c.573C>T  p.Asp191Asp  675/3430  573/1281  191/426
T  synonymous_variant                  LOW       MAP2K7          ENSG00000076984  transcript   ENST00000397983                   protein_coding   7/12        c.621C>T  p.Asp207Asp  681/3396  621/1308  207/435
T  synonymous_variant                  LOW       MAP2K7          ENSG00000076984  transcript   ENST00000397979                   protein_coding   6/11        c.573C>T  p.Asp191Asp  627/3361  573/1260  191/419
T  upstream_gene_variant               MODIFIER  MAP2K7          ENSG00000076984  transcript   ENST00000498118                   retained_intron  n.-398C>T   398
T  downstream_gene_variant             MODIFIER  CTD-3193O13.13  ENSG00000268149  transcript   ENST00000595655                   antisense        n.*1947G>A  1947
T  downstream_gene_variant             MODIFIER  MAP2K7          ENSG00000076984  transcript   ENST00000475022                   retained_intron  n.*3635C>T  3635
T  non_coding_transcript_exon_variant  MODIFIER  MAP2K7          ENSG00000076984  transcript   ENST00000468058                   retained_intron  5/10        n.768C>T
T  non_coding_transcript_exon_variant  MODIFIER  MAP2K7          ENSG00000076984  transcript   ENST00000494348                   retained_intron  2/3         n.496C>T
T  non_coding_transcript_exon_variant  MODIFIER  MAP2K7          ENSG00000076984  transcript   ENST00000465324                   retained_intron  2/2         n.365C>T
ADD REPLY
1
Entering edit mode

Really nicely formatted. Amazing. How did you achieve that?

ADD REPLY
2
Entering edit mode
ADD REPLY
0
Entering edit mode
 echo " AC=2;AF=(...)T00000465324|retained_intron|2/2|n.365C>T|||||| " | tr ";" "\n" | grep '^ANN=' | cut -c 5- | tr ',' '\n' | tr "|" "\t" | column -t
ADD REPLY
0
Entering edit mode

what's your version of SNPEFF ?

ADD REPLY
0
Entering edit mode

SnpEff version SnpEff 4.3p (build 2017-06-06 09:55), by Pablo Cingolani

Looks the most recent?

ADD REPLY
3
Entering edit mode
7.3 years ago
rbagnall ★ 1.8k

In the second example, the ensEMBL transcript ENST00000575124 has a protein coding region that extends into the intronic regions of other transcripts of this gene, and so explains the frameshift. You can see the extended protein coding region of transcript ENST00000575124 in the UCSC genome browser .

Not sure about the first example. A structural_interaction_variant suggests this amino acid is important for the protein structure, or for an interaction with another protein and so is flagged by SNPeff as an important amino acid site. However, in your case there is a synonymous change at this amino acid and so I would think your variant probably has no impact.

ADD COMMENT
0
Entering edit mode

Awesome that you've found a transcript that codes for TNFRSF12A!

As for the first example, does it actually mean the "interaction" is highly important, but whether the mutation makes any difference to the "interaction" is another matter?

ADD REPLY
0
Entering edit mode

Yeah, possibly. It may be a default annotation of an important amino acid residue.

ADD REPLY
0
Entering edit mode

The structural_interaction_variant is described in the "Protein interaction effects" section of the SnpEff documents. It's an annotation derived from the 3D structure data from PDB. Seems like it should not apply to synonymous variants.

If this can be reproduce in the latest version of SnpEff (5.0d, 2021-01-28), I'd recommend filing a bug report.

ADD REPLY

Login before adding your answer.

Traffic: 2065 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6