Blast Xml For Multiple Databases
1
0
Entering edit mode
13.4 years ago
Yann ▴ 70

Hello, I run blast2.2.25+ with multiple databases, I would like to view my result for each database in my XML result file, it is possible?

blast xml • 3.5k views
ADD COMMENT
1
Entering edit mode

what do you mean with "view my result for each database" ?

ADD REPLY
0
Entering edit mode

you could use table output instead of XML and do a grep.... Arrgh, no, pleeease don't kill me...

ADD REPLY
0
Entering edit mode

you could use table output instead of XML and do a grep.... Arrgh, no, pleeease don't kill me... :-)

ADD REPLY
0
Entering edit mode

You could also use [?] on multiple xml files instead of one. Care to explain the reason why you want to have it that way?

ADD REPLY
0
Entering edit mode

Yann should explain this him/herself, but I guess this question is about running a blast search against multiple databases in one go and then teasing out the individual databases later. I must admit that I don't really see a problem, as the database identifiers should be part of the hit entry name. In a table, this is easy to parse, and I am sure that the XML gurus can come up with some XSLT magic to this with XML, too. By the way, I see nothing bad in running blast with multiple databases - very efficient if there are multiple small databases.

ADD REPLY
1
Entering edit mode
13.4 years ago
Nabellaleen ▴ 10

If I remember, Blast can run a search on only 1 database at a time, no ?

To run a Blast on multiple databases, you have to merge them with blastdb_aliastool, which create a "new" database. That's "forbid"/"block" the possibility to distingue initial database used for each result.

If you give multiple databases directly to Blast, I don't know what happened, this is not documented (I didn't found an answer anyway). So I think if it doesn't crash, it probably call blastdb_aliastool.

In any case, on NCBI, Blast XML output is "based" on this DTD : http://www.ncbi.nlm.nih.gov/data_specs/dtd/NCBI_BlastOutput.mod.dtd

It says we have a "BlastOutput" element (<!ELEMENT BlastOutput ( ... )>) contains a "BLAST Database name" attribute (<!ELEMENT BlastOutput_db (#PCDATA)>) and is a parent of iteration elements (<!ELEMENT Iteration_hits (Hit*)> - one by sequence in the database), which is a parent of hits (<!ELEMENT Hit ( ... )>) which are parents of HSPs (<!ELEMENT Hsp ( ... )>)

So, if you have multiple databases in an XML result file, there is a "BlastOutput" element for each database used, which contain the DB name in "[?]Your DB Name[?]" element, and which also contain results on this database in "[?]" element.

ADD COMMENT
1
Entering edit mode

@Nabella: what BLAST are you referring to? I routinely use blast with multiple databases (although I tend to avoid XML output) the syntax is -db "data1 data2" for new BLAST+ or -d "data1 data2" for traditional blastall.

ADD REPLY
1
Entering edit mode

It is definitely possible to do this without aliastool, as mentioned above, passing multiple arguments for db. It has the annoying consequence of not telling you "which" database a particular sequence came from though

ADD REPLY
0
Entering edit mode

@Nabella, As far as I can see in Yann's previous questions ( Yann ), he already knows how is structured XML blast. So, I still don't understand what he means with "view"

ADD REPLY
0
Entering edit mode

My comments are based on my (failing ?) memory for Blast command line use, so ... If you say that's possible, I believe you. But my others comments stay : how blast treat this multiple databases ? What are the results ? ...

However, as Pierre said, maybe that's another subject ;)

ADD REPLY

Login before adding your answer.

Traffic: 2452 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6