Question

Biopython Parsing Of Blast Results - String Format Issue

0

Entering edit mode

13.0 years ago

Zach Powers ▴ 340

Hi Biostar,

I have two fasta files that I have blasted against one another and I am trying to make a list (dicitonary) of the top hits in a simple format of Query:Hit using Biopython. I am running into a an error, however, with the string's format. Here is an the script:

test_dictionary={}
blast_records = NCBIXML.parse(open(outfile))
for blast_record in blast_records:
    for alignment in blast_record.alignments:
        for hsp in alignment.hsps:
          test_dictionary.update({blast_record.query:alignment.title})

and here is an example dictionary entry where the u'' surrounds the value:

u'HA9WEQA08JTIW5': u'gnl|BL_ORD_ID|100 PhosphataseA'

however if I use the print command the values appear correct:

print alignment.title
gnl|BL_ORD_ID|100 PhosphataseA

I am sure this is a simple problem and results in my lack of understanding of precisely how Biopython stores its information. But any suggestions would be appreciated.

thanks zach cp

Edit *** as per DK's answer I ended up using this formulation where I split the output and keep the gene name:

test_dictionary.update[str(blast_record.query)] = str(alignment.title).split()[1]

biopython • 3.4k views

ADD COMMENT • link updated 13.0 years ago by Damian Kao 16k • written 13.0 years ago by Zach Powers ▴ 340

2

Entering edit mode

The strange u thing is to mark a Unicode string in Python 2

ADD REPLY • link 13.0 years ago by Peter 6.0k

0

Entering edit mode

You might not want to split it like that. If the gene name is multiple words, you'll only get the first word. I've edited my post to get just the gene name.

ADD REPLY • link 13.0 years ago by Damian Kao 16k

score 2 · Answer 1 · 2011-12-16

2

Entering edit mode

13.0 years ago

Damian Kao 16k

I am not exactly sure what the problem is. Are you saying the entries that are getting inserted into the dictionary is showing up with u'' surrounding the string?

If you want a string representation of anything in biopython to be saved, always cast it as a String just to be safe.

So instead of:

test_dictionary.update({blast_record.query:alignment.title})

do this:

test_dictionary.update({str(blast_record.query):str(alignment.title)})

or you can really just do this:

test_dictionary[str(blast_record.query)] = str(alignment.title)

To get just the gene name:

test_dictionary[str(blast_record.query)] = ' '.join(str(alignment.title).split()[1:])

ADD COMMENT • link 13.0 years ago by Damian Kao 16k

0

Entering edit mode

thanks DK. I am learning some basic programming backwards by writing scripts that are getting progressively better. Sometimes the things that are obvious to others are tough to figure out.

ADD REPLY • link 13.0 years ago by Zach Powers ▴ 340