Question

Convert VCF to new proteinfasta containing SNVs

0

Entering edit mode

8.8 years ago

mosquitoes • 0

Hello,

I am trying to write a new fasta of protein sequences which contains all of the SNVs I have identified from WGS for one strain. I know roughly how to get there, but not the exact tools available.

So far, I have:

Created a new gene fasta using gatk's FastaAlternateReferenceMaker. Using the -L option to only write genes into the fasta.

I know I can use biopython to convert this DNA fasta to an AA fasta, yet all of the genes on the reverse strand are reverse complemented in the new gene fasta. Is there a way to either change the negative strand genes to their reverse complement or tell a program this when it is translating the sequences. I could use the bed file as a reference/dictionary.

Thanks!

gatk biopython python fasta vcf • 2.6k views

ADD COMMENT • link updated 8.8 years ago by Brice Sarver ★ 3.8k • written 8.8 years ago by mosquitoes • 0

0

Entering edit mode

Does the fasta file made by FastaAlternateReferenceMaker provide any information from which strand it has extracted the gene? E.g. are there numbers in the `>' line that make it obvious which strand was used? Can you provide an example of this fasta file?

ADD REPLY • link 8.8 years ago by Markus ▴ 320

score 0 · Answer 1 · 2016-07-26

0

Entering edit mode

8.8 years ago

Brice Sarver ★ 3.8k

See the .reverse_complement() method in Biopython.

ADD COMMENT • link 8.8 years ago by Brice Sarver ★ 3.8k

0

Entering edit mode

Right, but I need to do this for the whole genome, where approximately half of the genes in the current fasta need to be reverse complemented. The only way to know which ones is by looking at the .bed

ADD REPLY • link 8.8 years ago by mosquitoes • 0

0

Entering edit mode

A BED file is just a tab-delimited file. As you extract the positions for your genes of interest, also extract the strand information. Do a logical evaluation: if "-" then reverse complement the sequence. You'll want to evaluate this for each region you extract, i.e., for each exon.

ADD REPLY • link 8.8 years ago by Brice Sarver ★ 3.8k