How to Grep the complete sequences containing a specific motif in a fasta file? Also, I want to include the lines beginning with a ">" before these target sequences.
First, you'd have to change your sequences so that the DNA is all in one line, without this step you'd miss possible motifs hits that have line breaks in them.
Test file:
To extract all sequences with KME in them and one can ignore the case as well in the example code:
Download seqkit here. -s = match only sequence; -r = pattern is regular expression; -i = ignore case; -p = search pattern
if fasta sequences are linearized (i.e all sequences are in a single line), then code would be: