I ran GATK variant discovery tools on my bam files and was interested in using the output to see how the variants may be affecting protein production. Additionally I wanted more information on what transcripts the variants can be found in. I formatted the GATK file by removing the header and adding a column for the stop locus. Then I ran annovar with:
perl annotate_variation.pl -geneanno -buildver hg19 gatk_files/... gatk_files/...
The script runs and leaves me with a gene annotated file called ..exonic_variant_function. The problem with the annovar output is that the cosmic variants identifiers don't align with the gene. For instance the first line in the file is:
line262 synonymous SNV NOC2L:NM_015658:exon16:c.C1843T:p.L615L, 1 881627 881627 G A
A cosmic search shows that p.L615L is a variant in KIAA1755 and not in NOC2L. My end goal is to alter the RNA sequence for the transcripts where the variant is found and to code for variant modified proteins in order to build a modified peptide centric database. From what I understand, the cosmic accession number gives the location of the variant from the start site which could make this process easy... if annovar was reporting on the correct variant->transcript.
Any ideas??
Thanks,
Jeremy
You are absolutely right Karl. I allowed myself to be led astray by the siren calls of Google.
Thanks,
Jeremy