I have downloaded the NCBI nt database using the blastdb_update.pl
perl script, but I want to blast some query file not on the whole nt database but on specific species. I know that when using blast locally it is possible to subset the nt/nr database using a list of GI identifiers, as explained here.
However, NCBI is phasing out GIs and we should instead use accession.version identifiers. I have downloaded those for my species, below is part of the file mygi.txt
.
When I run
blastdb_aliastool -gilist mygi.txt -db nt -out sthg.out -title sometitle
I obviously get
BLAST Database error: Specified file is not a valid GI/TI list.
since I am not providing a GI list.
I cannot find any command-line option in the manual to specify that I want to filter the nt database by accession number; any idea of how I can achieve that? I bet this option will have to be added by the BLAST team at some point :)
mygi.txt
below
AF324813.1
AF324814.1
AF324815.1
AF324816.1
AF324817.1
AF324818.1
AF324819.1
AF324820.1
AF324821.1
AF324822.1
AF324823.1
AF324824.1
AF370451.1
AY198341.1
AY198342.1
An alternative (and dirtier ;) ) possibility could be using this, then using
makeblastdb
and blast on this newly created database.