python script modifications to get tabular output of blast from DEG(database of essential genes)
1
0
Entering edit mode
7.5 years ago
ppnana • 0

I have a script available in github which parses blast output from xml to tabular https://github.com/peterjc/galaxy_blast/blob/master/tools/ncbi_blast_plus/blastxml_to_tabular.py but when i run the script it gives the following error::::

*Traceback (most recent call last):
  File "blastxml.py", line 328, in <module>
    convert(in_file, outfile)
  File "blastxmlexcel_sacred.py", line 184, in convert
    if re_default_query_id.match(qseqid):
TypeError: expected string or buffer*

The only is a difference in two blast results is that normal blast has these columns :

<Iteration>
  <Iteration_iter-num>1</Iteration_iter-num>
  <Iteration_query-ID>Query_15661</Iteration_query-ID>
  <Iteration_query-def>gi|927988967|gb|ALE41209.1| GDP-mannose 4,6-dehydratase [mycobacterium]</Iteration_query-def>
  <Iteration_query-len>340</Iteration_query-len>
<Iteration_hits>

but my blast result instead have

<Iteration>
      <Iteration_iter-num>1</Iteration_iter-num>
    <Iteration_hits>

Kindly suggest the changes required to be done

thanks

python biopython script essential genes database • 2.0k views
ADD COMMENT
0
Entering edit mode

I added markup to your post for increased readability. You can do this by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar, see image below:

101010 Button

ADD REPLY
0
Entering edit mode

thanks WouterDeCoster

ADD REPLY
1
Entering edit mode
7.5 years ago
jonasmst ▴ 410

That error is thrown when matching regular expressions with the re module. It's complaining that qseqid is not a string, so I'm assuming it's an int (a number, e.g. 42). Now it doesn't seem like you've linked the right script, as line 328 in the one you linked to is a commented-out print statement, which would not throw an error.

Wherever appropriate, you need to sanity check that qseqid is indeed a string. A crude way to do so is by

qseqid = str(qseqid)

before the call to match().

EDIT: Here's what I think is happening.

qseqid is read from an XML-file you're providing, and is taken from a tag called Iteration_query-ID:

qseqid = elem.findtext("Iteration_query-ID")

Usually, that's some value like Query_15661 (as you provided in your question). Now, since your blast results don't have that tag, I'm guessing

qseqid = elem.findtext("Iteration_query-ID")

returns None. And None is not a "String or buffer" so the call to match() fails:

>>> import re
>>> p = re.compile("ab*")
>>> p.match(None)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: expected string or buffer

EDIT 2: To conclude, the code you are using does not support your BLAST results. You'll either have to use something else, or modify the code to be robust against lacking tags, or just insert dummy-values for the tags in your BLAST results. I'm not that familiar with BLAST, let alone what you're trying to do here, so I can't tell you which solution is better.

ADD COMMENT

Login before adding your answer.

Traffic: 992 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6