Entering edit mode
5.2 years ago
andrespara
▴
30
Hi,
I want to gather CDs from NCBI or Ensembl. For some species there is NO curated RefSeq assembly, only the link to "ftp directory for GenBank assembly" There I found GBFF files AND genomic.fna files. If I only want CDs like I usually recover in RefSeq links, which one is the correct one?
I think I can convert GBFF files into fasta using other programs. I don't know how to filter CDs from the genomic.fna file (since I think there are more than CDs in this file). Thanks for your help,
Andrés
Not sure why you see only GBFF and genomic files. Any time I get assemblies there are bunch of other files, including
cds_from_genomic.fna.gz
which is what you need. Can you give a link to an assembly?I would not be surprised if they have only genomic DNA file, but if they have
.gff
or.gbff
files, it is straightforward to convert them into codons or proteins. You may want to try any2fasta orgffread
from Cufflinks. I am sure there are many other tools for converting.gff
and.gbff
into.fasta
.Thanks I will try some of these tools.