Entering edit mode
6.4 years ago
roger.huerlimann
▴
10
Hi all,
I followed this advice to subset the nr database with a specific taxonomic group: Vertebrate Subset Nr Database? Build My Own?
However, even though there were about 5 million GIs, the resulting database only ended up being 2 million sequences. Is this working as intended? Both the nr database and the GIs have been downloaded with only a day between, so I don't think someone placed 3 million sequences within that time period.
>blastdb_aliastool -gilist virus.gi_list180712.txt -db nr -out nr_virus -title nr_virus
Converted 4764026 GIs from virus.gi_list180712.txt to binary format in nr_virus.p.gil
Created protein BLAST (alias) database nr_virus with 2239853 sequences
Thanks!
Roger
@Roger: NCBI deprecated
gi's
for outside use a couple of years back. You could use all the viral genomes here or get the sequences for taxID you want using NCBI eUtils.