You'll be missing some parameters : the BlastOutput_param
, the Hsp_qseq
, Iteration_stat
etc... So, you won't be able to use the generated XML with another tool requiring a DTD validation. Generating a XML from your text file is just like trying to make a cow from a steak.
That said, one could imagine to pipe your file in awk (or perl) to build a XML file. Here I'm just using a awk script with your only line (and I dont' know the meaning of your columns ). For multiple Hsps or Hsp per Hit, you'll have to modify this script.
{
printf("<?xml version=\"1.0\"?>\n");
printf("<BlastOutput>\n");
printf(" <BlastOutput_program>blastn</BlastOutput_program>\n");
printf(" <BlastOutput_version>BLASTN 2.2.25+</BlastOutput_version>\n");
printf(" <BlastOutput_reference>Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schäffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402.</BlastOutput_reference>\n");
printf(" <BlastOutput_db>n/a</BlastOutput_db>\n");
printf(" <BlastOutput_query-ID>%s</BlastOutput_query-ID>\n",$1);
printf("<BlastOutput_iterations>\n");
printf("<Iteration>\n");
printf(" <Iteration_iter-num>1</Iteration_iter-num>\n");
printf(" <Iteration_query-len>%s</Iteration_query-len>\n",$3);
printf("<Iteration_hits>\n");
printf("<Hit>\n");
printf(" <Hit_num>%d</Hit_num>\n",hit_num++);
printf(" <Hit_def>%s</Hit_def>\n",$2);
printf(" <Hit_len>?</Hit_len>\n");
printf(" <Hit_hsps>\n");
printf(" <Hsp>\n");
printf(" <Hsp_num>1</Hsp_num>\n");
printf(" <Hsp_bit-score>159.983</Hsp_bit-score>\n");
printf(" <Hsp_score>176</Hsp_score>\n");
printf(" <Hsp_evalue>9.34813e-45</Hsp_evalue>\n");
printf(" <Hsp_query-from>%s</Hsp_query-from>\n",$5);
printf(" <Hsp_query-to>%s</Hsp_query-to>\n",$6);
printf(" <Hsp_hit-from>%s</Hsp_hit-from>\n",$7);
printf(" <Hsp_hit-to>%s</Hsp_hit-to>\n",$8);
printf(" <Hsp_query-frame>???</Hsp_query-frame>\n");
printf(" <Hsp_hit-frame>??</Hsp_hit-frame>\n");
printf(" <Hsp_identity>??</Hsp_identity>\n");
printf(" <Hsp_positive>??</Hsp_positive>\n");
printf(" <Hsp_gaps>?</Hsp_gaps>\n");
printf(" <Hsp_align-len>?</Hsp_align-len>\n");
printf(" </Hsp>\n");
printf(" </Hit_hsps>\n");
printf("</Hit>\n");
printf("</Iteration_hits>\n");
printf("</Iteration>\n");
printf("</BlastOutput_iterations>\n");
printf("</BlastOutput>\n");
}
awk -f file.awk blast.txt
<?xml version="1.0"?>
<BlastOutput>
<BlastOutput_program>blastn</BlastOutput_program>
<BlastOutput_version>BLASTN 2.2.25+</BlastOutput_version>
<BlastOutput_reference>Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schäffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402.</BlastOutput_reference>
<BlastOutput_db>n/a</BlastOutput_db>
<BlastOutput_query-ID>NP_417679.1</BlastOutput_query-ID>
<BlastOutput_iterations>
<Iteration>
<Iteration_iter-num>1</Iteration_iter-num>
<Iteration_query-len>41.2</Iteration_query-len>
<Iteration_hits>
<Hit>
<Hit_num>0</Hit_num>
<Hit_def>YDL171C</Hit_def>
<Hit_len>?</Hit_len>
<Hit_hsps>
<Hsp>
<Hsp_num>1</Hsp_num>
<Hsp_bit-score>159.983</Hsp_bit-score>
<Hsp_score>176</Hsp_score>
<Hsp_evalue>9.34813e-45</Hsp_evalue>
<Hsp_query-from>837</Hsp_query-from>
<Hsp_query-to>27</Hsp_query-to>
<Hsp_hit-from>13</Hsp_hit-from>
<Hsp_hit-to>1516</Hsp_hit-to>
<Hsp_query-frame>???</Hsp_query-frame>
<Hsp_hit-frame>??</Hsp_hit-frame>
<Hsp_identity>??</Hsp_identity>
<Hsp_positive>??</Hsp_positive>
<Hsp_gaps>?</Hsp_gaps>
<Hsp_align-len>?</Hsp_align-len>
</Hsp>
</Hit_hsps>
</Hit>
</Iteration_hits>
</Iteration>
</BlastOutput_iterations>
</BlastOutput>
Can you please include a sample of the output.