How to obtain protein-coding sequences from assembled genome/exome dataset?
0
0
Entering edit mode
5.3 years ago
DNAngel ▴ 250

I use bwa-mem to assemble my genome and exome datasets to work with just CDS of my various species. But so far, I was able to do this for individual CDS at a time using individual CDS ref sequences from different reference species.

Of course this is just not feasible when wanting to explore the entire genomic/exonic dataset and to test for selection on all the protein-coding genes obtained in my species. I am not sure how to assemble my raw single-end reads if I should download all the CDS sequences for the specific species and run it all in one file? The end of my custom script produces a single MSA file when using a single CDS gene as my reference, so would this produce one giant MSA alignment? I would have to then run various models one each gene individually or BLAST them so I need individual MSAs.

Any advice on this so I can be most efficient? End goal: obtain MSAs for all protein-coding genes in my genomic/exonic datasets so I can run various models testing for selection pressures on each gene.

PAML bwa • 854 views
ADD COMMENT

Login before adding your answer.

Traffic: 827 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6