Entering edit mode
5.7 years ago
genomes_and_MGEs
▴
10
Hey guys,
I have a multi-fasta protein file like this
>SF_hydrolase MKG...
>LH_reductase MKI...
>SM_hydrolase MSN...
Basically, I would like to extract only the fasta headers that have the other "reductase". I know how to extract headers that have the same headers as the ones present on a list, but I don't know how to extract fasta-headers solely based on one of the words.. Hope you can help me! Cheers
...and you can use
grep -A
andgrep -B
to extract lines before or after the line on which the matching keyword was found.That's a valid option, but sometimes fasta sequences might be wrapped in which case it might be a bit difficult. Why not use grep and parse it using a fasta parser?