Extracting a protein sequence from fasta file
0
0
Entering edit mode
7.6 years ago
h.l.wong ▴ 70

Hi all,

I am currently working on metagenomics and I would like to know if there's any way that I can extract a protein sequence from a fasta file? Thanks.

Cheers

Alan

gene sequence • 3.6k views
ADD COMMENT
1
Entering edit mode

In case you know the sequence name, use: https://github.com/lh3/bioawk

SEQNAME="<insert sequence name>"; bioawk -v x=$SEQNAME -c fastx '{if ($name==x) {print ">"$name"\n"$seq}}' main.fa > selected.fa

This will select the sequence you want from the main.fa file and print it to the selected.fa file.

ADD REPLY
0
Entering edit mode

What kind of fasta file (single, multi-fasta, DNA. protein)?

ADD REPLY
0
Entering edit mode

Thanks, I have a metagenomics data, do I extract the sequences from the assembled contigs file? Or do I need to extract the sequence out from other files?

And if I have an annotated protein (in KEGG), how can I get the nucleic acid sequence?

Thanks

ADD REPLY
0
Entering edit mode

prodigal or genmarks works fine and fast.

ADD REPLY
0
Entering edit mode

Thanks, should I use prodigal on the assemble contigs file to extract the sequences?

ADD REPLY
0
Entering edit mode

yes, read the manual, but yes, it is possible (nucleotide, protein or both), take care about translation table that you use.

http://prodigal.ornl.gov/
ADD REPLY
0
Entering edit mode

It helps if you elucidate your question. Do you mean extract sequences based on header like this?

ADD REPLY

Login before adding your answer.

Traffic: 2408 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6