Question

how to get the strand from a blast xml file

0

Entering edit mode

7.1 years ago

bio90029 ▴ 10

Hi, I am working with a blast xml as the one below:

 <Hit_num>2</Hit_num>
  <Hit_id>gnl|BL_ORD_ID|79</Hit_id>
  <Hit_def>NODE_82_length_56137_cov_22.281775</Hit_def>
  <Hit_accession>79</Hit_accession>
  <Hit_len>56195</Hit_len>
  <Hit_hsps>
    <Hsp>
      <Hsp_num>1</Hsp_num>
      <Hsp_bit-score>80.5295</Hsp_bit-score>
      <Hsp_score>46</Hsp_score>
      <Hsp_evalue>3.65762e-15</Hsp_evalue>
      <Hsp_query-from>274</Hsp_query-from>
      <Hsp_query-to>515</Hsp_query-to>
      <Hsp_hit-from>17046</Hsp_hit-from>
      <Hsp_hit-to>17290</Hsp_hit-to>
      <Hsp_query-frame>1</Hsp_query-frame>
      <Hsp_hit-frame>1</Hsp_hit-frame>
      <Hsp_identity>182</Hsp_identity>
      <Hsp_positive>182</Hsp_positive>
      <Hsp_gaps>13</Hsp_gaps>
      <Hsp_align-len>250</Hsp_align-len>
      <Hsp_qseq>ATGCGCGAAAAAGACGAGAAAATTCCGCTGCGTCACGGTGTGCTGGGTCAGGAAACCTGCGGCCCGGGTGGGATCTCTTACGGCATGCGCTCCAT-CGGCGGCGTACTGG-A-A-CTGGTGGATTATATGGAGCAGTACTC-CCCG--AACGCGTGGATGCTGAACTACTCCAACCCGGCGGCGATTGTGGCGGAAG-CGACGCGCCGCCTGCGTCCGCATGCCAAAATTCTCAACATCTGCGATATGCC</Hsp_qseq>
      <Hsp_hseq>ATGCGCGAGCAGGATGAAAAGATCCCGCTGCGCCACGGGGTGGTCGGCCAGGAAACCTGCGGACCCGGCGGGCTGGCATATGGCCTGCGTACTATCCTGC--CG-A-TGGTAGAGCTGATCGATCGGGTGGAACGCTACGCGCACGAAAAGGCGTGGATCGTGAACTACTCCAACCCGGCGGCGATTGTCGCTGAGGGCGT-GCGTCGCCTGCGCCCCAACGCACGCGTGCTGAACATCTGCGATATGCC</Hsp_hseq>
      <Hsp_midline>||||||||  | || || || || |||||||| ||||| ||| | || |||||||||||||| || || ||| |  | || ||| ||||  | || | ||  || | ||| | | ||| | |||    |||| |  ||| | | ||  || ||||||||  |||||||||||||||||||||||||||| || || | ||  ||| |||||||| ||  | ||     | || |||||||||||||||||</Hsp_midline>
    </Hsp>

I am using a script based on the one provide by biopython Cook's book (below) but this is what I get;

(None, None)

from Bio.Blast import NCBIXML
 blast_record = NCBIXML.read(result_handle)
for alignment in blast_record.alignments:
... for hsp in alignment.hsps:
        print hsp.strand

Can anyone help me to extract in which strand the subj.sequence is, please? I would like to do this so I can work with the sequence. What I would like to do is to get to the contigs and add the missing bases but if it is in the strand -1 I need to reversed_complement() the sequence. Thanks

python biopython • 2.3k views

ADD COMMENT • link updated 6.9 years ago by Tm ★ 1.1k • written 7.1 years ago by bio90029 ▴ 10

score 2 · Accepted Answer · 2017-12-22

2

Entering edit mode

6.9 years ago

Tm ★ 1.1k

you may try greping <hsp_qseq> and <hsp_hseq> lines showing alignment of query with that of subject (irrepective of the strand in which it belong) and can replace the missing base in contigs.

ADD COMMENT • link 6.9 years ago by Tm ★ 1.1k

0

Entering edit mode

Thanks, Unluckily, I cannot used grep as I wanted it as part of a python programme I was working on. By now I had finished it. I worked out the strand by the start and end positions.

ADD REPLY • link 6.9 years ago by bio90029 ▴ 10