BLAST reference genome indexing
1
0
Entering edit mode
21 months ago
bhumm ▴ 170

For some softwares I have had to index a reference genome prior to using blastn. Here is an example command:

makeblastdb –in mydb.fsa –dbtype nucl –parse_seqids

Whenever I do this I retain the original fasta file plus a bunch of extra files made by the command. For example:

mydb.nhr, mydb.nin, mydb.nsd, mydb.nsi, etc.

I haven't found very clear documentation as to what is happening with this command and what file is the final indexed genome that I should be using. Any links, information, or explanation on this is greatly appreciated.

blastn fasta shell • 2.2k views
ADD COMMENT
0
Entering edit mode
ADD REPLY
2
Entering edit mode
21 months ago
GenoMax 147k

makeblastdb is creating the index from mydb.fsa (which is your reference) file. This results in the set of files you name above. mydb is the basename for your blast database and should be used with -dboption.

From LINK:

BLAST+ provides a tool called makeblastdb that converts a subject FASTA file into an indexed and quickly searchable (but not human-readable) version of the same information, stored in a set of similarly named files (often at least three ending in .pin, .psq, and .phr for protein sequences, and .nin, .nsq, and .nhr for nucleotide sequences). This set of files represents the “database,” and the database name is the shared file name prefix of these files.

Files contain the following information LINK:

nhr: deflines
nin: indices
nsq: sequence data
nnd: GI data
nni: GI indices
nsd: non-GI data
nsi: non-GI indices
ADD COMMENT
0
Entering edit mode

Thanks for the explanation and links. So when calling the database in the for use, I call the prefix of all the 'subfiles' which invokes the fully indexed database?

ADD REPLY
1
Entering edit mode

That is correct.

ADD REPLY

Login before adding your answer.

Traffic: 2593 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6