Entering edit mode
9.5 years ago
Prasad
▴
50
Hello all,
I am trying to retrieve sequences from NCBI nr database. Using following command.
blastdbcmd -entry 'all' -db Path/to/db -outfmt '%f' -out output.fasta
It starts to give out fasta file but command fails after sometime. And the error message I get is
Error: CSeqDBAtlas::MapMmap: While mapping file [/mnt/LV1/blast_db/nr_nt/nr.07.psq] with 12898483378 bytes allocated, caught exception:
NCBI C++ Exception:
"/build/buildd/ncbi-blast+-2.2.28/c++/src/objtools/blast/seqdb_reader/seqdbatlas.cpp", line 152: Error: ncbi::SeqDB_ThrowException() - Validation failed: [end <= file_size] at /build/buildd/ncbi-blast+-2.2.28/c++/src/objtools/blast/seqdb_reader/seqdbatlas.cpp:506
Has anyone faced this problem? If so, then how to fix this error.
Any help is appreciated.
Thanks
I am trying to compare 2 NR database versions. I want to get gene identifiers ids from both the database so that I can blast sequence only to new gene identifier(GI) ids. For latest nr database I could download the fasta file and grep all GI ids from it. But for older nr database we don't have fasta file so first I was trying to get just GI from the nr database using
It is taking forever to complete that process as far as my calculations it will take around 27 days to get the
gi_id_list.txt
.That's the reason why I was trying to get every single sequence from NR in fasta so that I can grep out the GI id from it quickly.
Is there anyway quicker way to get all the GI's from older NR database.
Look Is there any BLAST database archive? for an alternative approach.
Sounds good to me. Will give a try to this approach.
If you read the documentation for blastdbcmd, it will lay out some of the available output options. I know it is possible to only collect ids/accessions/etc.
Yeah I blastdbcmd does give just the ids. I did try
which just gives out the ids but as I said in my earlier comment it will take forever to complete it.
I am looking for faster way to get GI id list from NR database. As of now I only see the quickest to get GI is from Fasta file.
I think either way will take a really long time unless you can fit the whole file into memory. Getting the GI from the fasta file would require you parsing each fasta definition line, which might slow you down.