Question

Tools Parsing Ncbi Blast -M 7 Xml Output Format?

3

Entering edit mode

13.6 years ago

Lhl ▴ 760

Hi all,

Is there any script or tool which is able to parse NCBI blast xml output (produced with -m 7 option) ?

I want a tab delimited file containing the following information:

 Name of the query sequence             Seq1
 2. Length of the query sequence           30
 3. Name of target sequence                gnl|BL_ORD_ID|0
 4. Length of target sequence              5528445
 5. Alignment bit score                    59.96
 6. E-value                                8.38112e-11
 7. Start of alignment within query        1
 8. End of alignment within query          30
 9. Start of alignment within target       5436010
10. End of alignment within target         5436039
11. Query frame                            1
12. Target frame                           1
13. Number of identical bases within       29
    the alignment
14. Alignment length                       30
15. Aligned portion (sequence) of query    CGGACAGCGCCGCCACCAACAAAGCCACCA
16. Aligned portion (sequence) of target   CGGACAGCGCCGCCACCAACAAAGCCATCA
17. Midline indicating positions of        ||||||||||||||||||||||||||| ||
    matches within the alignment

Thanks.

Elzed

blast xml parsing • 20k views

ADD COMMENT • link updated 3.3 years ago by Yoann Pageaud • 0 • written 13.6 years ago by Lhl ▴ 760

score 5 · Answer 1 · 2011-04-07

5

Entering edit mode

13.6 years ago

Neilfws 49k

All of the major Bio* projects contain libraries to parse BLAST XML output:

Bioperl - use the SearchIO module with option -format=>'blastxml'
BioPython - their tutorial recommends to use XML output for parsing
BioRuby - Bio::Blast.reports will read an XML file

Once you figure out how to extract the required fields, writing to CSV is quite easy in any of these languages.

Also, don't forget that running blastall with the -m 8 or -m 9 options will generate tab-delimited output (but if I recall correctly, not including the aligned sequences, which you need).

ADD COMMENT • link 13.6 years ago by Neilfws 49k

3

Entering edit mode

And the minor ones, too! http://hackage.haskell.org/packages/archive/bio/0.5.0.1/doc/html/Bio-Alignment-BlastXML.html

ADD REPLY • link 13.6 years ago by Ketil 4.1k

0

Entering edit mode

Thanks Neilfws. I got the XML files, which are required by other softs for annotation and it contains millions of sequences, so i do not want to wait for weeks by redoing blast with -m 8/9.

ADD REPLY • link 13.6 years ago by Lhl ▴ 760

Ram · Answer 2 · 2011-04-07

You can use XSLT to transform your xml to a tabular format:


<xsl:stylesheet version="1.0" xmlns:xsl="&lt;a href=" http:="" www.w3.org="" 1999="" XSL="" Transform"="" rel="nofollow">http://www.w3.org/1999/XSL/Transform"
 xmlns="http://www.w3.org/1999/xhtml"
 >


<xsl:output method="text"/>

<xsl:template match="/">
<xsl:apply-templates select="BlastOutput"/>
</xsl:template>



<xsl:template match="BlastOutput">
<xsl:variable name="queryDef" select="BlastOutput_query-def"/>
<xsl:variable name="queryLen" select="BlastOutput_query-len"/>
<xsl:for-each select="BlastOutput_iterations/Iteration/Iteration_hits/Hit">
<xsl:variable name="hitDef" select="Hit-def"/>
<xsl:variable name="hitLen" select="Hit-len"/>
<xsl:for-each select="Hit_hsps/Hsp">
<xsl:value-of select="$queryDef"/>
<xsl:text>    </xsl:text>
<xsl:value-of select="$queryLen"/>
<xsl:text>    </xsl:text>
<xsl:value-of select="$hitDef"/>
<xsl:text>    </xsl:text>
<xsl:value-of select="$hitLen"/>
<xsl:text>    </xsl:text>
<xsl:value-of select="Hsp_bit-score"/>
<xsl:text>    </xsl:text>
<xsl:value-of select="Hsp_evalue"/>
<xsl:text>    </xsl:text>
<xsl:value-of select="Hsp_query-from"/>
<xsl:text>    </xsl:text>
<xsl:value-of select="Hsp_query-to"/>
<xsl:text>    </xsl:text>
<xsl:value-of select="Hsp_hit-from"/>
<xsl:text>    </xsl:text>
<xsl:value-of select="Hsp_hit-to"/>
<xsl:text>    </xsl:text>
<xsl:value-of select="Hsp_query-frame"/>
<xsl:text>    </xsl:text>
<xsl:value-of select="Hsp_hit-frame"/>
<xsl:text>    </xsl:text>
<xsl:value-of select="Hsp_identity"/>
<xsl:text>    </xsl:text>
<xsl:value-of select="Hsp_positive"/>
<xsl:text>    </xsl:text>
<xsl:value-of select="Hsp_gaps"/>
<xsl:text>    </xsl:text>
<xsl:value-of select="Hsp_align-len"/>
<xsl:text>    </xsl:text>
<xsl:value-of select="Hsp_qseq"/>
<xsl:text>    </xsl:text>
<xsl:value-of select="Hsp_hseq"/>
<xsl:text>    </xsl:text>
<xsl:value-of select="Hsp_midline"/>
<xsl:text>
</xsl:text>
</xsl:for-each>
</xsl:for-each>
</xsl:template>



</xsl:stylesheet>

Example:

xsltproc --novalid blast2csv.xsl jeter.blast.xml

Result:

No definition line    99            159.983    9.34813e-45    1    99    1    105    1    1    99    99    6    105    ATGCCCGCCCTGCGCCCCGCTCTGCT---GTGGGCGCTGCTGGCGCTCTGGCTGTGCTG---CGCGGCCCCCGCGCATGCATTGCAGTGTCGAGATGGCTATGAA    ATGCCCGCCCTGCGCCCCGCTCTGCTAAAGTGGGCGCTGCTGGCGCTCTGGCTGTGCTGAAACGCGGCCCCCGCGCATGCATTGCAGTGTCGAGATGGCTATGAA    ||||||||||||||||||||||||||   ||||||||||||||||||||||||||||||   |||||||||||||||||||||||||||||||||||||||||||
No definition line    99            66.2076    1.5844e-16    1    36    106    141    1    1    36    36    0    36    ATGCCCGCCCTGCGCCCCGCTCTGCTGTGGGCGCTG    ATGCCCGCCCTGCGCCCCGCTCTGCTGTGGGCGCTG    ||||||||||||||||||||||||||||||||||||

score 1 · Answer 3 · 2011-04-07

1

Entering edit mode

13.6 years ago

John ▴ 50

For this purpose you can open the xml Blastoutput file in speardsheet as an external data source. You can also find NOBLAST(New Options for BLAST) useful for this purpose.NOBLAST is an open source program that provides a new user-friendly tabular output format for various NCBI BLAST programs (Blastn, Blastp, Blastx, Tblastn, Tblastx, Mega BLAST and Psi BLAST) without any use of a parser and provides E-value correction in case of use of segmented BLAST database.please read the complete publication here and download it from Here

ADD COMMENT • link 13.6 years ago by John ▴ 50

0

Entering edit mode

Hi! I am very new in this world and I do not have too much experience working on bioinformatics. I have downloaded NOBLAST but I have a question about its installation: Do I have to install BLAST on my computer prior to using it? How can I do that? Thanks a lot.

ADD REPLY • link 12.1 years ago by elmagodelabahia ▴ 60

score 1 · Answer 4 · 2011-04-08

1

Entering edit mode

13.6 years ago

Dejian ★ 1.3k

Bioperl gives some specific advice to deal with this problem.

ADD COMMENT • link 13.6 years ago by Dejian ★ 1.3k

0

Entering edit mode

Yes, You are right. Many thanks!

ADD REPLY • link 13.6 years ago by Lhl ▴ 760

score 0 · Answer 5 · 2021-07-27

I have developped a small R package able to do that. It is available on Github here.
Using the function NCBI_BLAST_XML2DT() you can load your NCBI BLAST XML result file as an R data.table.
If you have multiple related XML files you can do the same thing using the function aggregate_NCBI_BLAST_XMLs2DT().
Follow the documentation in the README for install, and create an issue in the repository if you have a problem.
Good luck !