Entering edit mode
7.9 years ago
vmicrobio
▴
290
Hi all!
I would like to filter my xml file, removing all fields from <Hit>
to </Hit>
that do not contain 'Homo sapiens' in <Hit_def>
. Do you have any idea how to do it simply?
http://www.ncbi.nlm.nih.gov/dtd/NCBI_BlastOutput.dtd">
<BlastOutput>
<BlastOutput_program>blastn</BlastOutput_program>
<BlastOutput_version>BLASTN 2.2.30+</BlastOutput_version>
<BlastOutput_reference>Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schäffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402.</BlastOutput_reference>
<BlastOutput_db>nt</BlastOutput_db>
<BlastOutput_query-ID>Query_1</BlastOutput_query-ID>
<BlastOutput_query-def>R1_mid6_filt_denovo_18-04-16_c1 cov=12.91 len=1158 gc=29.46 nseq=105</BlastOutput_query-def>
<BlastOutput_query-len>1158</BlastOutput_query-len>
<BlastOutput_param>
<Parameters>
<Parameters_expect>10</Parameters_expect>
<Parameters_sc-match>2</Parameters_sc-match>
<Parameters_sc-mismatch>-3</Parameters_sc-mismatch>
<Parameters_gap-open>5</Parameters_gap-open>
<Parameters_gap-extend>2</Parameters_gap-extend>
<Parameters_filter>L;m;</Parameters_filter>
</Parameters>
</BlastOutput_param>
<BlastOutput_iterations>
<Iteration>
<Iteration_iter-num>1</Iteration_iter-num>
<Iteration_query-ID>Query_1</Iteration_query-ID>
<Iteration_query-def>R1_mid6_filt_denovo_18-04-16_c1 cov=12.91 len=1158 gc=29.46 nseq=105</Iteration_query-def>
<Iteration_query-len>1158</Iteration_query-len>
<Iteration_hits>
<Hit>
<Hit_num>1</Hit_num>
<Hit_id>gi|18642927|gb|AC105445.3|</Hit_id>
<Hit_def>Homo sapiens BAC clone RP11-350B19 from 4, complete sequence</Hit_def>
<Hit_accession>AC105445</Hit_accession>
<Hit_len>128907</Hit_len>
<Hit_hsps>
<Hsp>
<Hsp_num>1</Hsp_num>
<Hsp_bit-score>1925.48</Hsp_bit-score>
<Hsp_score>2134</Hsp_score>
<Hsp_evalue>0</Hsp_evalue>
<Hsp_query-from>61</Hsp_query-from>
<Hsp_query-to>1144</Hsp_query-to>
<Hsp_hit-from>62962</Hsp_hit-from>
<Hsp_hit-to>61873</Hsp_hit-to>
<Hsp_query-frame>1</Hsp_query-frame>
<Hsp_hit-frame>-1</Hsp_hit-frame>
<Hsp_identity>1080</Hsp_identity>
<Hsp_positive>1080</Hsp_positive>
<Hsp_gaps>6</Hsp_gaps>
<Hsp_align-len>1090</Hsp_align-len>
<Hsp_qseq>GAGATKWCAAGTAATAACGTAATAATGTCTTTTTGTGATAAGAAATGCTTAGTTGAAGGTATGTGATTATTAGTCATTAAATCGCTTTGCATGTGCTTTGGTTCTAATTTACCTTATCTCATAAGAACATGTAATAAATTATACAGGGAATTTCTGTTAAAAAATAATCCCACAGTTGTATGAGATTGGAGGAGACTTTTAGATGGTAGGTCAGTGGTCTTGCATTTAGTTGAGCAAAAGCAATTTTGCTTCCATTTCAGGACCGCATTTTGCTCCTCATTATAGTAAGTAATGTAGCACTTTTCTGACTTCTATTTTAACATTAGAATTGGGATTACTATCTCATTAATTTTCAAAGTCTCTGCAAGAAAGTCAGTATAATCATCTTTAGTAATGAGGCACCTGTATGGTAAAAAGTCTTAGTAATTTGTCCATTGTTTCAAATCAAAGAGAGAAGTGGAGGCTTTTAAATTCTAGCAAGGTGTTTATGTTATTAATATTTTCACTTTACAACTATTACAAATTAATAATTTTTTTCTTCTTGGAAACCTAGTATAAATATATTTGTAGGTCATAAATAAAAAATGAGAAATCAACTAAAAATGTTACCTTGTTCAGACTCCATTCGCTAAATTTTTCTCTCATTTTCATTGCAGGACATTTGATTATGACTGATCATGCTATTTGTCAGTATGTACATATGTGTGTACGTATGTATGTATCAATCAACCAATTATTATAGTAGCAATATATTAATTTCATATTACCCAAAATTATAAGCCTGAATGTGTTGAAAACTTGAATTTAAAATTACATATTTCTGCAAAACTTTTTATTTCTTTTTGCCTTTTCCAAAAGCAAACACTGTTTTTGGCTTCTTTCTCTTTGCTTACTTCCATATTTCAAGTCATCCTGAAATGATTCCATGCTGGAATTTTCAAAATAATTTCATGTTGAAATTTCTGCCTTAATATCTCTTGTATAAAACTACCTCCTGTCCTAATGTATCATGTCAAAAAAAAAA------AATGAGGTTTCAGCTTTTCCCTTCACAAACTGTGTTTTCCTTTCATATGCAGAAATATGT</Hsp_qseq>
<Hsp_hseq>GAGATTACAAGTAATAACGTAATAATGTCTTTTTGTGATAAGAAATGCTTAGTTGAAGGTATGTGATTATTAGTCATTAAATCGCTTTGCATGTGCTTTGGTTCTAATTTACCTTATCTCATAAGAACATGTAATAAATTATACAGGGAATTTCTGTTAAAAAATAATCCCACAGTTGTATGAGATTGGAGGAGACTTTTAGATGGTAGGTCAGTGGTCTTGCATTTAGTTGAGCAAAAGCAATTTTGCTTCCATTTCAGGACCGCATTTTGCTCCTCATTATAGTAAGTAATGTAGCACTTTTCTGACTTCTATTTTAACATTAGAATTGGGATTACTATCTCATTAATTTTCAAAGTCTCTGCAAGAAAGTCAGTATAATCATCTTTAGTAATGAGGCACCTCTATGGTAAAAAGTCTTAGTAATTTGTCCATTGTTTCAAATCAAAGAGAGAAGTGGAGGCTTTTAAATTCTAGCAAGGTGTTTATGTTATTAATATTTTCACTTTACAACTATTACAAATTAATAATTTTTTTCTTCTTGGAAACCTAGTATAAATATATTTGTAGGTCATAAATAAAAAATGAGAAATCAACTAAAAATGTTACCTTGTTCAGACTCCATTCGCTAAATTTTTCTCTCATTTTCATTGCAGGACATTTGATTATGACTGATCATGCTATTTGTCAGTATGTACATATGTGTGTACGTATGTATGTATCAATCAACCAATTATTATAGTAGCAATATATTAATTTCATATTACCCAAAATTATAAGCCTGAATGTGTTGAAAACTTGAATTTAAAATTACATATTTCTGCAAAACTTTTTATTTCTTTTTGCCTTTTCCAAAAGCAAACACTGTTTTTGGCTTCTTTCTCTTTGCTTACTTCCATATTTCAAGTCATCCTGAAATGATTCCATGCTGGAATTTTCAAAATAATTTCATGTTGAAATTTCTGCCTTAATATCTCTTGTATAAAACTACCTCCTGTCCTAATGTATCATGTCAAAAAAAAAAAAAAATAAGGAGGTTTCAGCTTTTCCCTTCACAAACTGTGTTTTCCTTTCATATGCAGAAATATGT</Hsp_hseq>
<Hsp_midline>||||| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| || |||||||||||||||||||||||||||||||||||||||||||||||||||||||||</Hsp_midline>
</Hsp>
<Hsp>
<Hsp_num>2</Hsp_num>
<Hsp_bit-score>123.915</Hsp_bit-score>
<Hsp_score>136</Hsp_score>
<Hsp_evalue>6.73231e-24</Hsp_evalue>
<Hsp_query-from>1</Hsp_query-from>
<Hsp_query-to>74</Hsp_query-to>
<Hsp_hit-from>63185</Hsp_hit-from>
<Hsp_hit-to>63258</Hsp_hit-to>
<Hsp_query-frame>1</Hsp_query-frame>
<Hsp_hit-frame>1</Hsp_hit-frame>
<Hsp_identity>70</Hsp_identity>
<Hsp_positive>70</Hsp_positive>
<Hsp_gaps>0</Hsp_gaps>
<Hsp_align-len>74</Hsp_align-len>
<Hsp_qseq>TAATAACTCTATATCAGAAGTGTTTTATCGTTACCATTTACAGATGAGTAAACCAAGASWGAGATKWCAAGTAA</Hsp_qseq>
<Hsp_hseq>TAATAACTCTATATCAGAAGTGTTTTATCGTTACCATTTACAGATGAGTAAACCAAGACAGAGATGTCAAGTAA</Hsp_hseq>
<Hsp_midline>|||||||||||||||||||||||||||||||||||||||||||||||||||||||||| ||||| |||||||</Hsp_midline>
</Hsp>
<Hsp>
<Hsp_num>3</Hsp_num>
<Hsp_bit-score>49.9773</Hsp_bit-score>
<Hsp_score>54</Hsp_score>
<Hsp_evalue>0.121831</Hsp_evalue>
<Hsp_query-from>1132</Hsp_query-from>
<Hsp_query-to>1158</Hsp_query-to>
<Hsp_hit-from>62138</Hsp_hit-from>
<Hsp_hit-to>62164</Hsp_hit-to>
<Hsp_query-frame>1</Hsp_query-frame>
<Hsp_hit-frame>1</Hsp_hit-frame>
<Hsp_identity>27</Hsp_identity>
<Hsp_positive>27</Hsp_positive>
<Hsp_gaps>0</Hsp_gaps>
<Hsp_align-len>27</Hsp_align-len>
<Hsp_qseq>TGCAGAAATATGTAATTTTAAATTCAA</Hsp_qseq>
<Hsp_hseq>TGCAGAAATATGTAATTTTAAATTCAA</Hsp_hseq>
<Hsp_midline>|||||||||||||||||||||||||||</Hsp_midline>
</Hsp>
</Hit_hsps>
</Hit>
<Hit>
<Hit_num>2</Hit_num>
<Hit_id>gi|850484145|gb|CP011891.1|</Hit_id>
<Hit_def>Ovis canadensis canadensis isolate 43U chromosome 6 sequence</Hit_def>
...