Entering edit mode
7.6 years ago
h.l.wong
▴
70
Hi all,
I am currently working on metagenomics and I would like to know if there's any way that I can extract a protein sequence from a fasta file? Thanks.
Cheers
Alan
In case you know the sequence name, use: https://github.com/lh3/bioawk
This will select the sequence you want from the main.fa file and print it to the selected.fa file.
What kind of fasta file (single, multi-fasta, DNA. protein)?
Thanks, I have a metagenomics data, do I extract the sequences from the assembled contigs file? Or do I need to extract the sequence out from other files?
And if I have an annotated protein (in KEGG), how can I get the nucleic acid sequence?
Thanks
prodigal or genmarks works fine and fast.
Thanks, should I use prodigal on the assemble contigs file to extract the sequences?
yes, read the manual, but yes, it is possible (nucleotide, protein or both), take care about translation table that you use.
It helps if you elucidate your question. Do you mean extract sequences based on header like this?