Hello,
I have a VCF file containing SNPs called between a Trinity reference assembly and an alignment file, generated by samtools/bcftools.
I have annotated the Trinity reference assembly using the Trinotate pipeline (blastx Trinity transcripts against swissprot, blastp TransDecoder predicted proteins from Trinity transcripts against swissprot, and HMMER TransDecoder predicted proteins from trinity transcripts against Pfam).
Now, I would like to annotate the SNPs contained in my VCF file using my annotated Trinity reference assembly. Is this possible?
My Trinity transcripts in the reference assembly are formatted like so:
>TRINITY_DN1000|c115_g5_i1 len=247 path=[31015:0-148 23018:149-246]
AATCTTTTTTGGTATTGGCAGTACTGTGCTCTGGGTAGTGATTAGGGCAAAAGAAGACAC
ACAATAAAGAACCAGGTGTTAGACGTCAGCAAGTCAAGGCCTTGGTTCTCAGCAGACAGA
AGACAGCCCTTCTCAATCCTCATCCCTTCCCTGAACAGACATGTCTTCTGCAAGCTTCTC
CAAGTCAGTTGTTCACAGGAACATCATCAGAATAAATTTGAAATTATGATTAGTATCTGA
TAAAGCA
So I can't use a program like snpEff as that program leverages reference genomes that have a chr,pos format.
Has this been done before? Any info greatly appreciated as always. Thanks!
See Philipp Bayer answer to a more recent thread.
A: How to know synonymous and non-synonymous SNPs from transcriptome sequencing