Evening.
I have de novo assembled whole genome DNA sequences and an embl annotation file that contains CDS regions for multiple genes. I am wondering if there are any tools that can first, parse the embl annotation file to obtain the CDS positions. Then use these CDS genome positions extracted from the embl file to convert each DNA fasta file into multifasta files containing the translated amino acid sequence (.faa) for every CDS range given in the embl file for every isolate.
Thanks
It's the kind of tool that bioinformaticians usually brew theirselves. I am not aware of tools doing it, but in principle it's quite easy to achieve as long as you have a genetic code in the form of a dictionary / hash.
E.G. for python:
If you already have the CDS multifasta file, translating it is fairly trivial. Do you already have this or need to generate it from the EMBL file?
No, we do not have CDS multifasta file yet, we need to generate from EMBL file