Hello,
I am running blastn, blastx and tblastx searches on NCBI's nt, est, nr and HTGS databases using transcriptome data containing ~56,000 contigs. I have been able to produce biopython scripts to run these searches with only non-matching blast queries retrieved. Now I would like to retrieve blast queries which only show hits in the taxon 'caudata' i.e.protein-coding transcripts unique to urodeles. Is there a specifc boolean query I can put in the entrez query parameter of the qblast function which will perform this? Or will I have to do something more intensive such as perform the search specifc to each taxon and find the queries which only have hits in the caudata taxon.
Thanks for any help,
Regards, James
Could you clarify if you are doing this via the standalone legacy BLAST tools (i.e. binary blastall), the standalone BLAST+ tools (i.e. binaries blastn, blastx, tblastx) and if so, has the database has been installed locally on your computer or you are using the -remote option to run the search on the NCBI servers? Thanks!
Hi Peter, I'm doing this via Biopython and using the NCBIWWW.qblast function to run the searches. What I've since discovered is that I'll probably have to retrieve the taxon ID from the gi numbers of the blast hits and then script some sort of condition saying if the signifcant blast hits for this query contain taxon ids just from caudata, keep query, otherwise remove. Please also see my reply to jordan below for a better explanation of my problem. Thanks