How to Grep the complete sequences containing a specific motif in a fasta file using shell command? Also, I want to include the lines beginning with a >
before these target sequences.
I found this post :about Grep the complete sequences containing a specific motif in a fasta file similar to my problem but I'm looking for different motif:
My motifs look like that :
SXXXX(F/S)XXXL
All my fasta file in one line and I have more than 300 sequences:
for example:
my sequence :
>sp|Q9H257.2|CARD9_HUMAN RecName: Full=Caspase recruitment domain-containing protein 9; Short=hCARD9
MSDYENDDECWSVLEGSRVTLTSVIDRSRITPYLRQTKVLNPDDEEQVLSDPNLVIRKRKVGVLLDILQRTGHKGYVAFLESLELYYPQLYKKVTGKEPARVFSMIIDASGESGLTQLLMTEVMKLQKKVQDLTALLSSK
>sp|Q9H37.2|CTYU_HUMAN
HHHSVLEGFRVTLTSVIDRFRITPYLRQTKVLNPDDEEQVLSDPNLVIRKRKVGVLLDILQRTGHKGYVAFLESLELYYPQLYKKVTGKEPARVFSMIIDASGESGLTQLLMTEVMKLQKKVQDLTALLSSK
>sp|Q9re7.2|CARer_HUMAN RecName
BKLSVLEGWRVTLTSVIDRFRITPYLRQTKVLNPDDEEQVLSDPNLVIRKRKVGVLLDILQRTGHKGYVAFLESLELYYPQLYKKVTGKEPARVFSMIIDASGESGLTQLLMTEVMKLQKKVQDLTALLSSK
The result should be only the first two sequences because they have the motifs SXXXX(F/S)XXXL
>sp|Q9H257.2|CARD9_HUMAN RecName: Full=Caspase recruitment domain-containing protein 9; Short=hCARD9
MSDYENDDECWSVLEGSRVTLTSVIDRSRITPYLRQTKVLNPDDEEQVLSDPNLVIRKRKVGVLLDILQRTGHKGYVAFLESLELYYPQLYKKVTGKEPARVFSMIIDASGESGLTQLLMTEVMKLQKKVQDLTALLSSK
>sp|Q9H37.2|CTYU_HUMAN
HHHSVLEGFRVTLTSVIDRFRITPYLRQTKVLNPDDEEQVLSDPNLVIRKRKVGVLLDILQRTGHKGYVAFLESLELYYPQLYKKVTGKEPARVFSMIIDASGESGLTQLLMTEVMKLQKKVQDLTALLSSK
I tried these command but that returned all three sequences
grep 'S...F\|S\|L.\(.\)\1\{4\}' jara3.fasta -B 1 > jara4.fasta
What about if I want to keep only sequences that don’t have the motif” SXXXX (F/S)XXXL” and save that in new fasta file:
The result in that case should be only this sequence:
sp|Q9re7.2|CARer_HUMAN RecName BKLSVLEGWRVTLTSVIDRFRITPYLRQTKVLNPDDEEQVLSDPNLVIRKRKVGVLLDILQRTGHKGYVAFLESLELYYPQLYKKVTGKEPARVFSMIIDASGESGLTQLLMTEVMKLQKKVQDLTALLSSK
Read
man grep
and particularly read about the-v
option, and think about how you would use that with the accepted answer.