Hello, thanks to the great resource of this forum I was able to blastn a .fa file against a local blast database. The issue I am having is that the pipeline is blasting against every available sequence in the database. Even with the option: "-qcov_hsp_perc 0.95" I am getting good coverage but species like bacteria and birds are still be queried against.
Going through the ncbi genome site I see that species are broken down into "Kingdom: Eukaryota; Subgroup: Fishes", this in mind I would like to limit my queries to only a specific subgroup or subgroups and not the entire database.
I did not see anything like this in the blastn options yet I am sure there is a way to limit the number of groups a file is blasted against.
Please let me know your ideas so I can give them a try.
Rebuild the db only for subset of your interest. If this is not feasible just restrict blast search using -gilist by providing gi ids of subset sequence. Refer to the manual on detailed explanation
Restricting the blast search using a list of ids is a good way to go. I suggest using -seqidlist instead of -gilist, since ncbi is not going to maintain G.I. as primary identifiers for sequence records. Ps : seqidlist is a list of accession.version.
How to generate the needed seqidlist list is unclear yet (you need to define your search criterion)
Interesting idea, if I run "blastn -seqidlist" I get this error: "Error: Argument "-seqidlist". Value is missing". I need to add a file path after the option. My question is do I need the file to include "NC_" or "NG_"?, I think the available literature supports the idea of running -gilist instead because that method seems to be a bit more established.
I would follow this command line:
Query a BLAST database with a GI, but exclude that GI from the results Extract a GI from the ecoli database: $ blastdbcmd -entry all -db ecoli -dbtype nucl -outfmt %g | head -1 | \ tee exclude_me 1786181 Run the restricted database search, which shows there are no self-hits: $ blastn -db ecoli -negative_gilist exclude_me -show_gis -num_alignments 0 \ -query exclude_me | grep
cat exclude_me
Query= gi|1786181|gb|AE000111.1|AE000111I am just having a hard time isolating a list of gi's that I can screen.
Is anyone familiar with a source that can specify species and the corresponding gi so I can create an appropriate inclusion of exclusion file? I did not find anything in the manual or appendices.