Dear all,
I've just perform an exome-seq and I've obtained the vcf file. Now to continue with my experiment, I need to extract the flanking regions wt and mut type of my dataset because I need to synthesize that for an immunotherapy research. I mean, in my vfc file I have a column like this:
AAChange.refGene
A2M:NM_000014:exon30:c.C3797A:p.A1266E
ABCC12:NM_033226:exon12:c.G1738T:p.G580C
ABL1:NM_005157:exon11:c.C2972T:p.A991V,ABL1:NM_007313:exon11:c.C3029T:p.A1010V
And the desire output is like this:
Wt Epitope Mut Epitope
TVVALHALSKYGAATFTRTGKAAQV TVVALHALSKYGEATFTRTGKAAQV
DHQRYQHTVRVCGLQKDLSNLPYGD DHQRYQHTVRVCCLQKDLSNLPYGD
APVPSTLPSASSALAGDQPSSTAFI APVPSTLPSASSVLAGDQPSSTAFI
In case I've more than one transcritp, I'll need the first one. I know how to obtain the the flanking regions of nucleotides, but I had not find anything similar like a refGene.txt of amino acids. I've used hg19 as genome reference.
Any help is welcome!
Thank you Chris! But unfortunately I do not have a lot of time to learn how to use pVACtools, because I will need to use another format of my vcf file... I think there is another way faster to do what I need. Because if we have to amino acid position (e.g. G580C), with a script similar of bedtools I could get the flanking position. if anyone can help I will be very happy :)