Hi everyone,
I've used a variants calling pipeline to produce variants vcf file from non-model organism sequencing data. The vcf file have good variant numbers as I predicted, however, using SnpEff for the prediction seems to gave inaccurate number of effects in ann.vcf file. I've followed the manual instruction to build SnpEff database using two different ways:
sequences.fa + genes.gff file (with no intron or intergenic regions).
sequences.fa + genes.gtf file that converted from the previous gff file using gffread tool.
Both ways produced inaccurate number of effects in ann.vcf, but the second way gave less warnings with much better results. I've read previous post about producing a gtf file with only the longest transcript which did't solve my problem.
Anyone can help me?
Thank you all
Mohammed
Can you elaborate in more detail?
Number of modifiers are higher than variants number, and gtf file doesn't contain intron or intergenic regions check the following output:
of course, there is more than on prediction per variant (alternative transcripts, etc...)
they are inferred from the exons and the genes...
How to resolve this and get a better prediction?
what do you want to resolve ??
I'm trying to mimic a published work to verify the variant calling pipeline, I've downloaded the raw sequencing data and managed to do the same steps they did: 1) mapping to the reference genome (I've got the same results). 2) create realignment targets and realign around indels. 3) apply Base Quality Score Reclabration. 4) calling variant (HaplotypeCaller).
I've got a vcf file that have variants (SNPs and Indels) near to the publication. However, after following the manual instruction for building a database in SnpEff, I had a variant effect predication that is different than the publication.
Did you filter your vcf ?
Yes, I will give example for the difference. I had STOP_GAINED: 247, and published work STOP_GAINED: 1343. I've got differences in more other annotations.
how did you filter your vcf ? Did you use the same filters as the original paper ?
Yes, using bcftools, but even before the filter I've tested SnpEff in the raw variants and I'm still getting lower numbers of STOP_GAINED many other effects and higher numbers in other effects (Modifier)
Please add comment via
Add comment
. THe answer box is intended for answers. That will keep the thread logically organized.