Local Blast against a specific organism
1
0
Entering edit mode
9.9 years ago
Ritvik ▴ 30

Hi,

I want to do a local blast against a specific organism totally in-house without using negative gi list and remote option as the results differ between the online and offline Blast output in my species of interest.

I have tried all the ways that I could think of even the custom db creation but the results differ because not all the proteins in my species I can get hold of through ncbi search as for some sequences a hit is found which doesn't come under the gi list obtained from ncbi for my particular species. I am using Blast-2.2.30+. Any help or suggestion is welcome.

blast • 5.7k views
ADD COMMENT
0
Entering edit mode

You have to provide an example, with the taxon id and a gi that is in nr, from the taxon in question but not annotated with the taxon id. How did you generate your positive gi list (you should not use negative gi list for extracting a single organism from nr for obvious reasons)?

ADD REPLY
0
Entering edit mode

Thanks for replying and sorry, I meant positive gilist in the question. Actually, I was doing genome blast amongst different mycobacterial species so like for some sequences I get top a hit to mycobacterium tuberculosis complex whose definition doesn't bear the name of any specific organism and hence gets missed in my local blast output, though I can't seem to find that sequence now as it happened for a very few sequences. As soon as I find it, I will update it. Also, e value sometimes differ between online and local blast but not significantly like if it's 7e-142 in web blast, it comes as 2e-141 in local blast, the search space I think remains the same so why it's differing?

I am also checking my python code for parsing blast xml result, albeit, a bit unrelated, do you have any idea if the evalue, identities and other measures of first hsp in hsps class in biopython always corresponds to these values of the hit even if there are more than one HSP in an alignment?

ADD REPLY
0
Entering edit mode

which taxid does this m.tuberculusis complex entry have? You need to look for the assigned taxids only, don't try to parse the species from the description, that is too error prone. To generate your gi list you should take the mycobacterium taxid (1763) and then extract all gi's that are annotated at this level or below (species, strains, etc.), that is what the web-blast interface does as well and might explain the discrepancy in database size and hence e-values.

ADD REPLY
0
Entering edit mode

Like this accession number WP_011799063.1 whose organism name itself is Mycobacterium tuberculosis complex, and I am extracting my gi list through taxonomy browser only.

ADD REPLY
1
Entering edit mode

This should work, it is gi:500123058 and annotated with a correct taxid, you should have it in your gilist if you use taxid: 1763 Mycobacterium to generate it.

ADD REPLY
0
Entering edit mode

Ok, I will try again and thanks for your help!

ADD REPLY
1
Entering edit mode
9.9 years ago
Peter 6.0k

Current I think the best option is to make your own organism specific database, using an appropriate set of gene calls for your organism. This will also make the searches faster as you have a smaller database this way (but remember the e-value depends on the database size).

Sticking with a big NCBI database like NR, you are wasting compute time if you do the search then filter it, but that works. You can do this via an Entrez query with -remote as you mention, or using the tabular or CSV output you could ask for the taxonomy information and post-process the output to filter locally. See http://blastedbio.blogspot.co.uk/2012/05/blast-tabular-missing-descriptions.html

I would like the NCBI to support taxonomy filtering directly in BLAST+, this was Number 2 on my BLAST+ Christmas Wish List 2014, http://blastedbio.blogspot.co.uk/2014/12/blast-christmas-wish-list.html

ADD COMMENT

Login before adding your answer.

Traffic: 1783 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6