Parse Ncbixml Output Into A Python List Of Hits
1
0
Entering edit mode
12.9 years ago
Zach Powers ▴ 340

I would like to parse an NCBIXML file to obtain a list if the format:

known_results[i]=(query title, (hit_name,hit_name,hit_name....))

However I am having trouble getting the slice operator to work:

    knowns = "output.xml" 
    i=0
    for record in NCBIXML.parse(open(knowns)): 
        print record.query_id
        known_results[i] = record.query_id     
        known_results[i][1] = (align.title for  align in i.record.alignment)     
        i+=1

which results in:

list assignment index out of range

since i can do known_results[1]= "sample text" I think the problem is that I cannot use the slice method with a variable.

Can anyone suggest and alternative way to create this list?

thanks zach cp


crossposted with answer at StackOverflowlink text


There are two good answers on stackoverflow. The first uses list.append(), the second uses dictionaries. THe major problem with my construct is that you cannot assign values to parts of a list that have yet to be created.

biopython list • 3.8k views
ADD COMMENT
0
Entering edit mode

Is known_results a list, or a list of lists?

ADD REPLY
0
Entering edit mode

its a list of lists. the answer on the best ways to do this is on the StackExchange link.

ADD REPLY
1
Entering edit mode
12.9 years ago

It's probably better to use a dictionary in this case:

knowns = "output.xml" 
known_results = {}
for record in NCBIXML.parse(open(knowns)): 
   print record.query_id
   known_results[record.query_id] = [align.title for align in i.record.alignment]

Now if you want to access your data, you can:

for queryID, alignments in known_results.items():
    print queryID
    for alignment in alignments:
        DO STUFF WITH ALIGNMENT
ADD COMMENT

Login before adding your answer.

Traffic: 1718 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6