Hi,
I want to do a local blast against a specific organism totally in-house without using negative gi list and remote option as the results differ between the online and offline Blast output in my species of interest.
I have tried all the ways that I could think of even the custom db creation but the results differ because not all the proteins in my species I can get hold of through ncbi search as for some sequences a hit is found which doesn't come under the gi list obtained from ncbi for my particular species. I am using Blast-2.2.30+. Any help or suggestion is welcome.
You have to provide an example, with the taxon id and a gi that is in nr, from the taxon in question but not annotated with the taxon id. How did you generate your positive gi list (you should not use negative gi list for extracting a single organism from nr for obvious reasons)?
Thanks for replying and sorry, I meant positive gilist in the question. Actually, I was doing genome blast amongst different mycobacterial species so like for some sequences I get top a hit to mycobacterium tuberculosis complex whose definition doesn't bear the name of any specific organism and hence gets missed in my local blast output, though I can't seem to find that sequence now as it happened for a very few sequences. As soon as I find it, I will update it. Also, e value sometimes differ between online and local blast but not significantly like if it's 7e-142 in web blast, it comes as 2e-141 in local blast, the search space I think remains the same so why it's differing?
I am also checking my python code for parsing blast xml result, albeit, a bit unrelated, do you have any idea if the evalue, identities and other measures of first hsp in hsps class in biopython always corresponds to these values of the hit even if there are more than one HSP in an alignment?
which taxid does this m.tuberculusis complex entry have? You need to look for the assigned taxids only, don't try to parse the species from the description, that is too error prone. To generate your gi list you should take the mycobacterium taxid (1763) and then extract all gi's that are annotated at this level or below (species, strains, etc.), that is what the web-blast interface does as well and might explain the discrepancy in database size and hence e-values.
Like this accession number WP_011799063.1 whose organism name itself is Mycobacterium tuberculosis complex, and I am extracting my gi list through taxonomy browser only.
This should work, it is
gi:500123058
and annotated with a correct taxid, you should have it in your gilist if you use taxid: 1763 Mycobacterium to generate it.Ok, I will try again and thanks for your help!