I have the blast output in .xml form and I want to retrieve few attributes like <hit_def>. I found the parser on biophython.
from Bio.Blast import NCBIXML
blast = NCBIXML.parse(open('output.xml', 'rU'))
for record in blast:
for align in record.alignments:
for hsp in align.hsps:
print hsp.score, align.hit_def
Q1: Above code is just printing the out put on the terminal. Could anyone help me how to store the output file in .csv format.
Specifically, I need output.csv with these attribute <Iteration_query-def>
, <Hit_def>
, <Hsp_score>
, <Hsp_evalue>
as columns, in a .csv format.
Q2: How can I to get the result just for the best hit of each query? While running blastp setting -max_target_seqs
to 1 will do the same?
Following is a segment of my input xml
<Hit_def>low-density lipoprotein receptor-related protein 6 precursor [Homo sapiens] >gi|578822872|ref|XP_006719141.1| PREDICTED: low-density lipoprotein receptor-related protein 6 isoform X1 [Homo sapiens]</Hit_def>
<Hsp_midline>+N C + C H+CL R G C C GF L+S K C+ V + ++L + R L + V + A+D D VTD+RIY + KT A+ + SA E V +G D + K +YW TG + VS + V + D R + +D +YW E+</Hsp_midline>
<Hsp_midline>NEC S C H+CLA GGFVC C ++L + + S T +V D Q LPI S RNV AID D + D ++Y</Hsp_midline>
I would really appreciate your help.
using xsltproc rather than python would be straighforward.