local blastn against one subgroup of species
0
0
Entering edit mode
7.1 years ago
jf • 0

Hello, thanks to the great resource of this forum I was able to blastn a .fa file against a local blast database. The issue I am having is that the pipeline is blasting against every available sequence in the database. Even with the option: "-qcov_hsp_perc 0.95" I am getting good coverage but species like bacteria and birds are still be queried against.

Going through the ncbi genome site I see that species are broken down into "Kingdom: Eukaryota; Subgroup: Fishes", this in mind I would like to limit my queries to only a specific subgroup or subgroups and not the entire database.

I did not see anything like this in the blastn options yet I am sure there is a way to limit the number of groups a file is blasted against.

Please let me know your ideas so I can give them a try.

blastn • 1.9k views
ADD COMMENT
1
Entering edit mode

Rebuild the db only for subset of your interest. If this is not feasible just restrict blast search using -gilist by providing gi ids of subset sequence. Refer to the manual on detailed explanation

ADD REPLY
0
Entering edit mode

Restricting the blast search using a list of ids is a good way to go. I suggest using -seqidlist instead of -gilist, since ncbi is not going to maintain G.I. as primary identifiers for sequence records. Ps : seqidlist is a list of accession.version.

How to generate the needed seqidlist list is unclear yet (you need to define your search criterion)

ADD REPLY
0
Entering edit mode

Interesting idea, if I run "blastn -seqidlist" I get this error: "Error: Argument "-seqidlist". Value is missing". I need to add a file path after the option. My question is do I need the file to include "NC_" or "NG_"?, I think the available literature supports the idea of running -gilist instead because that method seems to be a bit more established.

I would follow this command line:

Query a BLAST database with a GI, but exclude that GI from the results Extract a GI from the ecoli database: $ blastdbcmd -entry all -db ecoli -dbtype nucl -outfmt %g | head -1 | \ tee exclude_me 1786181 Run the restricted database search, which shows there are no self-hits: $ blastn -db ecoli -negative_gilist exclude_me -show_gis -num_alignments 0 \ -query exclude_me | grep cat exclude_me Query= gi|1786181|gb|AE000111.1|AE000111

I am just having a hard time isolating a list of gi's that I can screen.

Is anyone familiar with a source that can specify species and the corresponding gi so I can create an appropriate inclusion of exclusion file? I did not find anything in the manual or appendices.

ADD REPLY

Login before adding your answer.

Traffic: 1940 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6