Hello,
I have the blast output in .xml form and I want to retrieve few attributes like <hit_def>. I found the parser on biophython.
CODE:
from Bio.Blast import NCBIXML
blast = NCBIXML.parse(open('output.xml', 'rU'))
for record in blast:
for align in record.alignments:
for hsp in align.hsps:
print hsp.score, align.hit_def
Q1: Above code is just printing the out put on the terminal. Could anyone help me how to store the output file in .csv format.
Specifically, I need output.csv with these attribute <Iteration_query-def>
, <Hit_def>
, <Hsp_score>
, <Hsp_evalue>
as columns, in a .csv format.
Q2: How can I to get the result just for the best hit of each query? While running blastp setting -max_target_seqs
to 1 will do the same?
Following is a segment of my input xml
<Iteration_iter-num>1</Iteration_iter-num>
<Iteration_query-ID>Query_1</Iteration_query-ID>
<Iteration_query-def>comp552019_c3_seq6_V2</Iteration_query-def>
<Iteration_query-len>227</Iteration_query-len>
<Iteration_hits>
<Hit>
<Hit_num>1</Hit_num>
<Hit_id>gi|148727288|ref|NP_002327.2|</Hit_id>
<Hit_def>low-density lipoprotein receptor-related protein 6 precursor [Homo sapiens] >gi|578822872|ref|XP_006719141.1| PREDICTED: low-density lipoprotein receptor-related protein 6 isoform X1 [Homo sapiens]</Hit_def>
<Hit_accession>NP_002327</Hit_accession>
<Hit_len>1613</Hit_len>
<Hit_hsps>
<Hsp>
<Hsp_num>1</Hsp_num>
<Hsp_bit-score>43.5133894476967</Hsp_bit-score>
<Hsp_score>101</Hsp_score>
<Hsp_evalue>0.000198686946331968</Hsp_evalue>
<Hsp_query-from>43</Hsp_query-from>
<Hsp_query-to>223</Hsp_query-to>
<Hsp_hit-from>589</Hsp_hit-from>
<Hsp_hit-to>767</Hsp_hit-to>
<Hsp_query-frame>0</Hsp_query-frame>
<Hsp_hit-frame>0</Hsp_hit-frame>
<Hsp_identity>53</Hsp_identity>
<Hsp_positive>79</Hsp_positive>
<Hsp_gaps>24</Hsp_gaps>
<Hsp_align-len>192</Hsp_align-len>
<Hsp_qseq>TNEC--HDSKCEHICLARDAGGFVCKCSPGFTLVSGYK-CVSDSVTDDYILVADLGQKRLFQLPIRKST-----RNVGDLVAIDLDDVTDDRIYAASVIKKTGGLAWFDISAREIV--WGSKRLSRDDAVLSITTGCCNKKVYWTTQTGIYSWDGVSSTPDKLYSVSFFSDA-QIRQVVVDCKANLLYWIEY</Hsp_qseq>
<Hsp_hseq>SNPCAEENGGCSHLCLYRPQG-LRCACPIGFELISDMKTCI---VPEAFLLFSRRADIRRISLETNNNNVAIPLTGVKEASALDFD-VTDNRIYWTDISLKTISRAFMNGSALEHVVEFGL------DYPEGMAVDWLGKNLYW-ADTGTNRIE-VSKLDGQHRQVLVWKDLDSPRALALDPAEGFMYWTEW</Hsp_hseq>
<Hsp_midline>+N C + C H+CL R G C C GF L+S K C+ V + ++L + R L + V + A+D D VTD+RIY + KT A+ + SA E V +G D + K +YW TG + VS + V + D R + +D +YW E+</Hsp_midline>
</Hsp>
<Hsp>
<Hsp_num>2</Hsp_num>
<Hsp_bit-score>39.6613936885231</Hsp_bit-score>
<Hsp_score>91</Hsp_score>
<Hsp_evalue>0.00402563881724524</Hsp_evalue>
<Hsp_query-from>44</Hsp_query-from>
<Hsp_query-to>128</Hsp_query-to>
<Hsp_hit-from>891</Hsp_hit-from>
<Hsp_hit-to>980</Hsp_hit-to>
<Hsp_query-frame>0</Hsp_query-frame>
<Hsp_hit-frame>0</Hsp_hit-frame>
<Hsp_identity>34</Hsp_identity>
<Hsp_positive>43</Hsp_positive>
<Hsp_gaps>15</Hsp_gaps>
<Hsp_align-len>95</Hsp_align-len>
<Hsp_qseq>NECHDSK--CEHICLARDAGGFVCKCSPGFTLVSGYKCVSDSVTDDYI--------LVADLGQKRLFQLPIRKSTRNVGDLVAIDLDDVTDDRIY</Hsp_qseq>
<Hsp_hseq>NECASSNGHCSHLCLAVPVGGFVCGCPAHYSLNADNRTCSAPTTFLLFSQKSAINRMVIDEQQSPDIILPIH-SLRNV---RAIDYDPL-DKQLY</Hsp_hseq>
<Hsp_midline>NEC S C H+CLA GGFVC C ++L + + S T +V D Q LPI S RNV AID D + D ++Y</Hsp_midline>
</Hsp>
</Hit_hsps>
I would really appreciate your help.
Thanks
using xsltproc rather than python would be straighforward.