I've a perl script which executes a blastn analysis. I want to parse the output of blast results attending to the kingdom of the best hit. For example, if I'm analysing arabidopsis data, I want to be able to extract those sequences with a plantae hit. And the same procedure for the other kingdoms
(plantae, fungi, animalia, protista, bacteria, archaea). This is the blastn command line I'm using:
blastn -query sequences.fa -db nt -out blast_out -evalue 0.01 -outfmt '6 qseqid qlen sseqid slen length pident evalue sscinames stitle staxids sskingdoms' -max_target_seqs 1 -num_threads 25
What I've been doing until now is get hit species names, go to NCBI, extract complete taxID of those species, and in the case of "animalia", grep for "33208
" number.
Any suggestion to do this automatically in perl or any script/api available for doing this task?
EDIT:
Maybe I'm not explaining correctly. Let's say I'm working with fungi X specie. I blast the sequences of that fungi against nt, and for example I get this hit for a given sequence:
seq_4459 408 gi|517322946|emb|HF679031.1| 2984819 411 85.40 8e-115 Fusarium fujikuroi IMI 58289 Fusarium fujikuroi IMI 58289 draft genome, chromosome FFUJ_chr09 1279085 Eukaryota
The hit is a fusarium hit, and as I'm working with fungi, I want to keep this line. How can I extract all possible fungi hits, but not other kindoms?
sort both files on the keys and join http://www.computerhope.com/unix/ujoin.htm
Thank you Pierre. I guess that I've not explained the issue correctly. Check the edit if you want, thanks.
Filter your script output to send each kingdom data to its own file. You can do this from within your script, see this.
Thank you Jean-Karim. I guess that I've not explained the issue correctly. Check the edit if you want, thanks.
If I understand correctly, the problem is that you don't get the relevant taxonomic division in the output so for each taxon, you need to find out where in the taxonomic tree it is. Maybe this older post could help you.
Yes, that post has the solution to the problem. Thank you a lot Jean. Another question, do you know the taxonomy ID for protista kingdom? I'm not able to find it in NCBI. Thanks again.
I believe that the kingdom protista is obsolete so you won't find it in a modern taxonomy tree.
OK. I found the solution in another post, sorry, I could not find it before: Does A Script Exist That Given A Species Name Will Give You Kingdom Which The Input Belongs To?
You will have to grab all the TaxID's from fungi and then filter on them. You should be able to get those from Taxdump file from NCBI (ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/). Take a look at the Taxdump Readme file.