Entering edit mode
5.0 years ago
ysas
▴
10
I am trying to extract several sequences from a Fasta file using IDs partially matching with the header. I have written a script to perform it, but I only get one sequence.
grep -Fwf ID_list.txt -A1 input.fasta >> output.fa
Here is input.fasta
>xxx|Issori2|100290|CE99543_15407
ATGGCTGTCAAGATTAGGAAACCACAGTACAAAGAAAGAGGCATTACTTGGGAAGATCAATCAGTTGTCC....
>xxx|Issori2|100354|CE99607_9185
ATGTCCCATATTGTTCGTATACCCAATGTCTTTGATCACAACTCTGACCTCCCAATACCTG......
>xxx|Issori2|100388|CE99641_51257
ATGTCACAAGAAAAACATTGGAACTATACCAAAGATATTGTCAGGACATCGATTTCTGGTGTCTGTGC......
Here is my ID_list.txt
CE101211_3315
CE99767_31939
CE99607_9185
CE99543_15407
Here is output.fa
>xxx|Issori2|100290|CE99543_15407
ATGGCTGTCAAGATTAGGAAACCACAGTACAAAGAAAGAGGCATTACTTGGGAAGATCAATCAGTTGTCC....
Somehow I only get the sequence matched with the ID listed at the end of text file. Could you please point out how can I get all sequences matched with ID list?
Thank you for your help.
With seqkit, it works. Thank you for your help!
Hello YusukeSasaki ,
If an answer was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted. You can accept more than one if they work.