blastdb_aliastool mismatch converted GIs and final sequences
0
0
Entering edit mode
6.4 years ago

Hi all,

I followed this advice to subset the nr database with a specific taxonomic group: Vertebrate Subset Nr Database? Build My Own?

However, even though there were about 5 million GIs, the resulting database only ended up being 2 million sequences. Is this working as intended? Both the nr database and the GIs have been downloaded with only a day between, so I don't think someone placed 3 million sequences within that time period.

>blastdb_aliastool -gilist virus.gi_list180712.txt -db nr -out nr_virus -title nr_virus
Converted 4764026 GIs from virus.gi_list180712.txt to binary format in nr_virus.p.gil
Created protein BLAST (alias) database nr_virus with 2239853 sequences

Thanks!

Roger

blast database • 962 views
ADD COMMENT
0
Entering edit mode

@Roger: NCBI deprecated gi's for outside use a couple of years back. You could use all the viral genomes here or get the sequences for taxID you want using NCBI eUtils.

ADD REPLY

Login before adding your answer.

Traffic: 2458 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6