Extracting a protein sequence from fasta file

0

Entering edit mode

8.1 years ago

h.l.wong ▴ 70

Hi all,

I am currently working on metagenomics and I would like to know if there's any way that I can extract a protein sequence from a fasta file? Thanks.

Cheers

Alan

gene sequence • 3.8k views

ADD COMMENT • link updated 8.0 years ago by Biostar 20 • written 8.1 years ago by h.l.wong ▴ 70

1

Entering edit mode

In case you know the sequence name, use: https://github.com/lh3/bioawk

SEQNAME="<insert sequence name>"; bioawk -v x=$SEQNAME -c fastx '{if ($name==x) {print ">"$name"\n"$seq}}' main.fa > selected.fa

This will select the sequence you want from the main.fa file and print it to the selected.fa file.

ADD REPLY • link 8.1 years ago by Matteo Schiavinato ★ 3.7k

0

Entering edit mode

What kind of fasta file (single, multi-fasta, DNA. protein)?

ADD REPLY • link 8.1 years ago by GenoMax 151k

0

Entering edit mode

Thanks, I have a metagenomics data, do I extract the sequences from the assembled contigs file? Or do I need to extract the sequence out from other files?

And if I have an annotated protein (in KEGG), how can I get the nucleic acid sequence?

Thanks

ADD REPLY • link 8.1 years ago by h.l.wong ▴ 70

0

Entering edit mode

prodigal or genmarks works fine and fast.

ADD REPLY • link 8.1 years ago by Buffo ★ 2.4k

0

Entering edit mode

Thanks, should I use prodigal on the assemble contigs file to extract the sequences?

ADD REPLY • link 8.1 years ago by h.l.wong ▴ 70

0

Entering edit mode

yes, read the manual, but yes, it is possible (nucleotide, protein or both), take care about translation table that you use.

http://prodigal.ornl.gov/

ADD REPLY • link 8.1 years ago by Buffo ★ 2.4k

0

Entering edit mode

It helps if you elucidate your question. Do you mean extract sequences based on header like this?

ADD REPLY • link 8.1 years ago by Rohit ★ 1.5k

Login before adding your answer.