Extract fasta-headers that have the same word
1
0
Entering edit mode
5.7 years ago

Hey guys,

I have a multi-fasta protein file like this

>SF_hydrolase MKG...
>LH_reductase MKI...
>SM_hydrolase MSN...

Basically, I would like to extract only the fasta headers that have the other "reductase". I know how to extract headers that have the same headers as the ones present on a list, but I don't know how to extract fasta-headers solely based on one of the words.. Hope you can help me! Cheers

Assembly sequencing • 1.3k views
ADD COMMENT
3
Entering edit mode
5.7 years ago
GenoMax 147k

grep is your friend. It will look for words you specify in lines.

ADD COMMENT
0
Entering edit mode

...and you can use grep -A and grep -B to extract lines before or after the line on which the matching keyword was found.

ADD REPLY
0
Entering edit mode

That's a valid option, but sometimes fasta sequences might be wrapped in which case it might be a bit difficult. Why not use grep and parse it using a fasta parser?

ADD REPLY

Login before adding your answer.

Traffic: 1649 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6