Hi all,
I have a list of 90 sequences and 35 datasets in fasta format (data1.fasta, data2.fasta, ...., data35.fasta).
The list of the 90 sequences is in text format seperated by Tab:
un1 aggcgaa
un2 caagacc
...
un90 gccagc
Now I would like to find out whether each sequence from the list is present in all the datasets (present in which dataset).
Basically, I think grep command would work:
grep "aggcgaa" data1.fasta
grep "aggcgaa" data2.fasta
...
grep "aggcgaa" data35.fasta
If it is present in data1.fasta, print "data1.fasta"...
But it is too time consuming to do taht for each of the sequences.
Could anybody tell me how to use a loop to do this job? I have no idea how to use loop for this kind of work, but i suppose there should be a way.
Thanks.
Hi, ashutoshmits, thanks for your reply.
The script should be like
But i am not familiar with linux loop. I am stuck in how to extract the seqs by using "for i in $(? ) " at the beginning of the loop. Could you show me how me the script here? Thanks again for your help.
If you have replaced
?
in sample script withcat file_with_fasta_to_search
then replacegrep -l i fasta_folder/*.fasta | wc -l
withgrep $i fasta_folder/*.fasta
this should give you the the sequences in your fasta file