I have a fasta file myfasta.fasta
like this:
>aat.2.2344.a
ATTGCCGGTTTAATATTA
>aat.2.d2344.acc
ATTGCCGGTTTAATAAA
>aat.2.2bb344.a
ATTGCCGGTTTAATAGGAGAGAATT
>aat.2.2ccc344.a
ATTGCCGGTTTAATAGGGAG
>aat.2.2344.acc
ATTGCCGGTTTAATAAA
I also have a text file my.txt
which contains the sequence that matches the sequence in fasta file above:
ATTGCCGGTTTAATAAA
Based on this sequence, I want to extract all matched IDs for this sequence. Can someone please help me with this? Thanks!
The result I want is:
>aat.2.2344.acc
>aat.2.d2344.acc
Are the sequences all one line? If so you can just use
grep -B 1 ...
Yes they are 50 bps reads.
Dear MAPK, if you usually work with FASTA files you may find SEDA (http://www.sing-group.org/seda/) an useful tool. It has a great variety of operations to manipulate, filter, and transform FASTA files (check out the manual to see all of them: https://www.sing-group.org/seda/manual/index.html). It also allows you to explore a set of FASTA files and extract only the information you need, such as the sequence identifiers (see https://www.sing-group.org/seda/manual/graphical-user-interface.html#the-input-area).
With best regards, Hugo.