Hello,
I have called variants using a pipeline consisting of samtools mpileup, bcftools call and bcftools filter to obtain a VCF file containing SNPs and short INDELS.
I would like to annotate the SNPs and INDELS in my VCF file to predict the effect of function of SNPs. From my understanding, most programs require that the variant headers in the VCF file have chromosome names that match the annotation file or database.
I'm working with SNPs and INDELS called from a de novo transcriptome assembled by Trinity, therefore the variant calls in my VCF file look like this:
TRINITY_DN165715_c0_g1_i1 349 . A G 91 PASS DP=11;VDB=0.746774;SGB=-0.676189;MQSB=0.0297172;MQ0F=0;AC=2;AN=2;DP4=0,0,6,5;MQ=17 GT:PL 1/1:121,33,0
Is there a script that I can use to reformat my variant headers to a more generic format used by variant annotation programs?
Any info would be greatly appreciated.
Cheers,
Mike
Do you know how to convert from your custom transcript locations to genome locations? What organism are you working with?
Thanks for getting back to me, Sean. At the moment, no, I don't know how to convert from my custom transcript locations to genome locations. Would that be the first step in getting my variant headers reformatted? I'm working with flying squirrels; calling SNPs between two species.
Mike
Hi Sean,
Are you familiar with a reliable program that is able to convert transcript locations to genome locations?
Mike