You will need to learn how to compare one against one before attempting all vs all. Yes using "for" in linux will be important, but what do you hope to achieve? What kind of comparison? I feel this task is too large, all vs all is not going to be meaningful.
makeblastdb -in all.fasta -dbtype prot -title all -out all -hash_index
blastp -query all.fasta -db all -out all-vs-all -outfmt 6
Better still:
makeblastdb -in all.fasta -dbtype prot -title all -out all -hash_index
blastp -query all.fasta -db all -out all-vs-all -outfmt 6 -use_sw_tback -seg yes -soft_masking true -max_target_seqs 20000 -num_threads "MAXonYourSystem"
export LC_ALL=en_US.UTF-8
export LANG=en_US.UTF-8
for i in {4..6}
do
awk '$11 <= 1e-"$i"' all-vs-all > all-vs-all.1e-"$i"
done
And all the other stuff like e.g. sorting for best hits, but really, your question was way too vague to say what should be done. E.g., if the seqs are from NCBI, you could include taxid_map in makeblastdb arguments and make use of that in blastp..
That's really great, thank you very much! My goal is to find a correct way to search for the best hit.
It turned out to be rather complicated. "Sorting for best hit" - could you, please, explain to me, how to do it correctly!
My seqs are from NCBI. Honestly, I don't know how to do this second part: "you could include taxid_map in makeblastdb arguments and make use of that in blastp". It would be extremely useful to me to learn this. Please-please-please, give me more details! Many-many thanks!
That's really great, thank you very much! My goal is to find a correct way to search for the best hit.
It turned out to be rather complicated. "Sorting for best hit" - could you, please, explain to me, how to do it correctly!
My seqs are from NCBI. Honestly, I don't know how to do this second part: "you could include taxid_map in makeblastdb arguments and make use of that in blastp". It would be extremely useful to me to learn this. Please-please-please, give me more details!
Many-many thanks!
P.S. I'm not sure where I have to place my answer, so I've just repeat it here.
As to taxid_map, this file links all protein GIs to NCBI taxon IDs, so you can use it with the taxid_map argument. But then you also need to use the '-parse_seqids' flag in makeblastdb. Then you also need NCBI tax_db in your db dir. I believe all this is documented and explained further in blast manual. Anyway, then when you have this stuff working you can do e.g. -outfmt '6 std sscinames' to have 13 column output with species name as the last column..
You will need to learn how to compare one against one before attempting all vs all. Yes using "for" in linux will be important, but what do you hope to achieve? What kind of comparison? I feel this task is too large, all vs all is not going to be meaningful.