more than one annotation at a locus in SnpEff output?
1
3
Entering edit mode
7.8 years ago
sun.nation ▴ 140

I created the genome database by myself and ran:

$JAVA -jar /home/sshrest1/bin//snpEff/snpEff.jar Corynespora c1VSc2.vcf > c1VSc2_annotated.vcf

I am getting multiple annotations at a locus. I was not able to figure out the issue, any thoughts?

scaffold_1      3820284 .       G       A       5081.87 PASS    AC=1;AF=0.333;AN=3;BaseQRankSum=6.426;ClippingRankSum=0.000;DP=215;FS=0
.531;MLEAC=1;MLEAF=0.333;MQ=60.00;MQRankSum=0.000;QD=29.82;ReadPosRankSum=-1.702;SOR=0.756;ANN=A|missense_variant|MODERATE|"estExt_Genemark1.C_1_t30310"|GENE_"estExt_Genemark1.C_1_t30310"|transcript|TRANSCRIPT_"estExt_Genemark1.C_1_t30310"|protein_coding|4/5|c.1834C>T|p.Pro612Ser|1834/2304|1834/2304|612/767||,A|missense_variant|MODERATE|"gm1.1310_g"|GENE_"gm1.1310_g"|transcript|TRANSCRIPT_"gm1.1310_g"|protein_coding|4/5|c.1834C>T|p.Pro612Ser|1834/2304|1834/2304|612/767||,A|missense_variant|MODERATE|"fgenesh1_kg.1_#_1261_#_Locus3512v2rpkm41.18"|GENE_"fgenesh1_kg.1_#_1261_#_Locus3512v2rpkm41.18"|transcript|TRANSCRIPT_"fgenesh1_kg.1_#_1261_#_Locus3512v2rpkm41.18"|protein_coding|3/3|c.1705C>T|p.Pro569Ser|1705/1953|1705/1953|569/650||,A|missense_variant|MODERATE|"e_gw1.1.1471.1"|GENE_"e_gw1.1.1471.1"|transcript|TRANSCRIPT_"e_gw1.1.1471.1"|protein_coding|4/4|c.1801C>T|p.Pro601Ser|1801/2049|1801/2049|601/682||,A|missense_variant|MODERATE|"e_gw1.1.1890.1"|GENE_"e_gw1.1.1890.1"|transcript|TRANSCRIPT_"e_gw1.1.1890.1"|protein_coding|5/5|c.1663C>T|p.Pro555Ser|1663/1911|1663/1911|555/636||,A|missense_variant|MODERATE|"e_gw1.1.2199.1"|GENE_"e_gw1.1.2199.1"|transcript|TRANSCRIPT_"e_gw1.1.2199.1"|protein_coding|4/4|c.1834C>T|p.Pro612Ser|1834/2082|1834/2082|612/693||,A|missense_variant|MODERATE|"estExt_Genewise1.C_1_t60389"|GENE_"estExt_Genewise1.C_1_t60389"|transcript|TRANSCRIPT_"estExt_Genewise1.C_1_t60389"|protein_coding|4/4|c.1834C>T|p.Pro612Ser|1834/2082|1834/2082|612/693|,A|missense_variant|MODERATE|"estExt_Genewise1.C_1_t60390"|GENE_"estExt_Genewise1.C_1_t60390"|transcript|TRANSCRIPT_"estExt_Genewise1.C_1_t60390"|protein_coding|4/4|c.1834C>T|p.Pro612Ser|1834/2082|1834/2082|612/693||,A|missense_variant|MODERATE|"estExt_Genewise1.C_1_t60391"|GENE_"estExt_Genewise1.C_1_t60391"|transcript|TRANSCRIPT_"estExt_Genewise1.C_1_t60391"|protein_coding|4/4|c.1834C>T|p.Pro612Ser|1834/2082|1834/2082|612/693||,A|missense_variant|MODERATE|"estExt_Genewise1Plus.C_1_t60371"|GENE_"estExt_Genewise1Plus.C_1_t60371"|transript|TRANSCRIPT_"estExt_Genewise1Plus.C_1_t60371"|protein_coding|4/4|c.1834C>T|p.Pro612Ser|1834/2082|1834/2082|612/693||,A|missense_variant|MODERATE|"estExt_Genewise1Plus.C_1_t60372"|GENE_"estExt_Genewise1Plus.C_1_t60372"|transcript|TRANSCRIPT_"estExt_Genewise1Plus.C_1_t60372"|protein_coding|4/4|c.1834C>T|p.Pro612Ser|1834/2082|1834/2082|612/693||,A|missense_variant|MODERATE|"estExt_Genewise1Plus.C_1_t60373"|GENE_"estExt_Genewise1Plus.C_1_t60373"|transcript|TRANSCRIPT_"estExt_Genewise1Plus.C_1_t60373"|protein_coding|4/4|c.1834C>T|p.Pro612Ser|1834/2082|1834/2082|612/693||,A|missense_variant|MODERATE|"estExt_fgenesh1_pg.C_1_t30330"|GENE_"estExt_fgenesh1_pg.C_1_t30330"|  and so on
snpeff • 4.6k views
ADD COMMENT
1
Entering edit mode

Each annotation seems to be for a different transcript. Either you have many transcripts overlapping that locus or you have a problem with the original gene annotation file.

ADD REPLY
0
Entering edit mode

I am getting multiple annotations at a locus. I was not able to figure out the issue

LIFE

ADD REPLY
1
Entering edit mode
6.3 years ago
jilguero888 ▴ 20

From snpEff manual:

"Usually there is more than one effect reported in each EFF field. There are several reasons for this: - A variant can affect multiple genes. E.g a variant can be DOWNSTREAM from one gene and UPSTREAM from another gene. - In complex organisms, genes usually have multiple transcripts. So SnpEff reports the effect of a variant on each transcript. - A VCF line can have more then one variant. E.g. If reference genome is 'G', but the sample has either 'A' or 'T' (non-biallelic variant), then this will be reported as one VCF line, having multiple alternative variants (notice that there are two ALTs)"

Probably the most problematic is the first one. In case this is your problem, you can change the surrounding area with the option -ud size_in_bases. From snpEff manual:

"You can change the default upstream and downstream interval size (default is 5K) using the -ud size_in_bases option. This also allows to eliminate any upstream and downstream effect by using "-ud 0"."

ADD COMMENT
0
Entering edit mode

So how should one choose the correct annotation? Even I am having the same issue, there are multiple variants, not sure which is the correct one

ADD REPLY

Login before adding your answer.

Traffic: 1922 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6