vcf to fasta CDS converter
1
0
Entering edit mode
10.0 years ago
mosquitoes • 0

I need to create a fasta file that contains only CDS for each sample that I have NGS and genotyped using gatk. I've used gatk FastaAlternateReferenceMaker and then BEDtools and the .gff to pull out all the exons (or CDSs), but this does not put the coding sequences together for each gene. Also, gatk FastaAlternateReferenceMaker outputs a fasta with chromosome names listed chr1...etc. (i.e. not matching the names in the .gff). My genome has many contigs and it is time consuming to change these by hand. Is there a better way to do this? Any tools out there exist to go from a vcf file to a fasta file specific to each sample I've sequenced that has the CDS for each gene?

I need this fasta to eventually feed into PAML so I can calculate dn/ds for each gene. If there's a better way to do this also, please let me know.

Thanks!

next-gen-sequencing • 4.0k views
ADD COMMENT
1
Entering edit mode
10.0 years ago

Just use gtf2fasta (I think tophat comes with such an executable, but if not there are python scripts out there) after changing the chromosome names of either the fasta files or the GTF file (you could just use awk to do that). You'll need to modify the GTF such that it only contains the CDS entries and then rename those to "exon", since most conversion programs are expecting to make transcripts (again, you can do this with awk).

ADD COMMENT

Login before adding your answer.

Traffic: 1972 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6