variant annotation issue
1
0
Entering edit mode
7.7 years ago
prasundutta87 ▴ 670

Hi,

I have recently performed a variant annotation using Snpeff in water buffalo. The variants were called through bcftools call program (multiallelic caller) from Samtools package and are based on RNAseq reads. The BAM files were produced by aligning the RNAseq reads on the water buffalo genome available in NCBI Genome website.

SnpEff annotated many variants as downstream gene variants which is not right because the variants are called based on RNAseq reads. Can anyone throw any light on this?

RNA-Seq SNP software error genome alignment • 2.2k views
ADD COMMENT
2
Entering edit mode

Are you sure you used the same assembly for mapping and annotation ?

ADD REPLY
0
Entering edit mode

Yes. I did. There us only one draft assembly present in the NCBI genome database for a Mediterranean water buffalo.

ADD REPLY
0
Entering edit mode

I hope I am not missing anything logically. I am not sure what exactly to look for or check in order to find the reason behind it. Any suggestion is welcome.

ADD REPLY
0
Entering edit mode

You could try annotating with a different annotator and compare results. That could probably help you understand what's going wrong.

ADD REPLY
0
Entering edit mode

Yes..I am trying to do that with VEP..but ensemble does not have water buffalo genome. I am trying to use their standalone Perl version and custom use it by using the water buffalo genome and annotation file present in NCBI Genome database.

ADD REPLY
0
Entering edit mode
7.7 years ago

Genomes are often higher quality than transcriptomes. If the transcriptome is incomplete (which it always is, for vertebrates) you will get RNA reads mapping to genes that were not annotated. This is a good thing, as it can advance science! You've possibly discovered previously unknown genes.

ADD COMMENT
0
Entering edit mode

The thing is that I annotated the variants using SNpEff using -onlyprotein option (meaning-Only use protein coding transcripts). I though that becasue of this non-coding genes and probable genes will be avoided becasue I am focusing on only specific type of genes. I guess -onlyprotein option did not work correctly or I think I understood it wrongly.

ADD REPLY
0
Entering edit mode

It seems like you might be confused about the nature of annotation. The water buffalo genome is not complete. The water buffalo transcriptome is even less complete. So, you can map reads to an official water buffalo reference, but the results will not be perfectly correct. It does not matter which variant-caller or annotation software you use - if the reference or annotation is incomplete, you will get incorrect results.

ADD REPLY
0
Entering edit mode

Yes..the genome/transcriptome is not complete..what I will do is that I will complement my rnaseq variant calling with DNAseq data variant call and hopefully get a consensus..

ADD REPLY

Login before adding your answer.

Traffic: 1403 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6