Biopython Parsing Of Blast Results - String Format Issue
1
0
Entering edit mode
13.0 years ago
Zach Powers ▴ 340

Hi Biostar,

I have two fasta files that I have blasted against one another and I am trying to make a list (dicitonary) of the top hits in a simple format of Query:Hit using Biopython. I am running into a an error, however, with the string's format. Here is an the script:

test_dictionary={}
blast_records = NCBIXML.parse(open(outfile))
for blast_record in blast_records:
    for alignment in blast_record.alignments:
        for hsp in alignment.hsps:
          test_dictionary.update({blast_record.query:alignment.title})

and here is an example dictionary entry where the u'' surrounds the value:

u'HA9WEQA08JTIW5': u'gnl|BL_ORD_ID|100 PhosphataseA'

however if I use the print command the values appear correct:

print alignment.title
gnl|BL_ORD_ID|100 PhosphataseA

I am sure this is a simple problem and results in my lack of understanding of precisely how Biopython stores its information. But any suggestions would be appreciated.

thanks zach cp

Edit *** as per DK's answer I ended up using this formulation where I split the output and keep the gene name:

 test_dictionary.update[str(blast_record.query)] = str(alignment.title).split()[1]
biopython • 3.4k views
ADD COMMENT
2
Entering edit mode

The strange u thing is to mark a Unicode string in Python 2

ADD REPLY
0
Entering edit mode

You might not want to split it like that. If the gene name is multiple words, you'll only get the first word. I've edited my post to get just the gene name.

ADD REPLY
2
Entering edit mode
13.0 years ago

I am not exactly sure what the problem is. Are you saying the entries that are getting inserted into the dictionary is showing up with u'' surrounding the string?

If you want a string representation of anything in biopython to be saved, always cast it as a String just to be safe.

So instead of:

test_dictionary.update({blast_record.query:alignment.title})

do this:

test_dictionary.update({str(blast_record.query):str(alignment.title)})

or you can really just do this:

test_dictionary[str(blast_record.query)] = str(alignment.title)

To get just the gene name:

test_dictionary[str(blast_record.query)] = ' '.join(str(alignment.title).split()[1:])
ADD COMMENT
0
Entering edit mode

thanks DK. I am learning some basic programming backwards by writing scripts that are getting progressively better. Sometimes the things that are obvious to others are tough to figure out.

ADD REPLY

Login before adding your answer.

Traffic: 1377 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6