[Python] How to convert a list to a dictionary if dict() does not work without using BioPython
4
0
Entering edit mode
8.6 years ago

Hello everybody,

Sample file:

>sp|Q6GZX2|003R_FRG3G  (438 aa)
Uncharacterized protein 3R.  [Frog virus 3 (isolate Goorha) (FV-3)]
MARPLLGKTSSVRRRLESLSACSIFFFLRKFCQKMASLVFLNSPVYQMSNILLTERRQVDRAMGGSDDDGVMVVALSPSD
FKTVLGSALLAVERDMVHVVPKYLQTPGILHDMLVLLTPIFGEALSVDMSGATDVMVQQIATAGFVDVDPLHSSVSWKDN
VSCPVALLAVSNAVRTMMGQPCQVTLIIDVGTQNILRDLVNLPVEMSGDLQVMAYTKDPLGKVPAVGVSVFDSGSVQKGD
AHSVGAPDGLVSFHTHPVSSAVELNYHAGWPSNVDMSSLLTMKNLMHVVVAEEGLWTMARTLSMQRLTKVLTDAEKDVMR
AAAFNLFLPLNELRVMGTKDSNNKSLKTYFEVFETFTIGALMKHSGVTPTAFVDRRWLDNTIYHMGFIPWGRDMRFVVEY
DLDGTNPFLNTVPTLMSVKRKAKIQEMFDNMVSRMVTS
      2 - 9:          ArpllGKT

Sample code:

 def get_sequence(): 
     try:
         with open("Filename.txt") as f:
             file = f.readlines()
             raw_data = ''
             start_reading = False
             for line in file:
                 if line.startswith(">"):
                     start_reading = True
                 if start_reading:
                     raw_data += line
             sequence = raw_data.split(">") 
             sequence = sequence[1:]       
     except IOError:
         print('Some meaningfull message')
         quit()
     finally:
         print(sequence[0])
         print(sequence[1])
         dict(sequence)
         return sequence

My question is how can I convert the list sequence to a dictionary? It would be really nice if the organism is the key and the value is a list of the other data. The dict() method raises a ValueError.

This is a school assignment, so I'm not allowed to use BioPython.

Thanks in advance!

software error • 11k views
ADD COMMENT
0
Entering edit mode

How can I post code without this messed up layout?

ADD REPLY
0
Entering edit mode

Above the text box you write in, there should be a number of box icons which you can use to edit the text... Highlight your code and then click the little box with the ones and zeroes in it.

ADD REPLY
0
Entering edit mode

Thanks! It works.

ADD REPLY
0
Entering edit mode
8.6 years ago
Zaag ▴ 870

Maybe try something like this:

from collections import defaultdict

d = defaultdict(list)


    d[sequence[0]].append(sequence[1:])
ADD COMMENT
0
Entering edit mode

Thanks for your reply! I've tried your solution and it raises TypeError: first argument must be callable. Now I've done the following:

>    def maak_dictionary(sequence):
>     d = {}
>     for k, v in sequence:
>         d[k].append(v)
>     print(d)

and it raises ValueError: too many values to unpack (expected 2). What can I do? Maybe I should forget about the dictionary and work with a list? But my teacher recommends me to use a dictionary and I need to search through the data in order to find the human sequences which matches a specific regex. I can not ask my teacher for help.

ADD REPLY
0
Entering edit mode
from collections import defaultdict


d = defaultdict(list)


with open("Filename.txt") as f:
    file = f.readlines()
    raw_data = ''
    start_reading = False
    for line in file:
        if line.startswith(">"):
            if 'header' in locals():
                if '[' in header:
                    d[header].append(raw_data) 

            line.strip()
            header = line

        if '[' in line:
            line.strip()
            header += line

        else:
            line.strip()
            raw_data += line

This seems to get it in a dict, but you'll need to do some parsing on the text i guess.

ADD REPLY
0
Entering edit mode
8.6 years ago
lmanohara99 ▴ 20
try:
    with open("sequence.txt") as f:
      file = f.readlines()
      raw_data = ''
      start_reading = False
      for line in file:
          if line.startswith(">"):
              start_reading = True
          if start_reading:
              raw_data += line
      sequence = raw_data.split(">")
      sequence = sequence[1:]
except IOError:
         print('Some meaningfull message')
         quit()
finally:
         sequenceDict = dict(mouse=sequence)
         print(sequenceDict)

I think, this should be help to you. Please refer this https://docs.python.org/2/tutorial/datastructures.html#dictionaries

ADD COMMENT
0
Entering edit mode

Thanks for your reply! This is the best solution I've seen so far. There is only one problem. I made a test file with 2 records similar to the one shown in the example above. The original file has hundreds of records. The program places everything under the same key. Is there a possibility to place every record under a different key?

ADD REPLY
0
Entering edit mode
8.6 years ago
lmanohara99 ▴ 20

In that case, you should follow some code like this, because you could not add keys dynamically with above solution.

   sequenceDict = {}; 
        for index, elem in enumerate(sequence):
                sequenceDict[index] = elem 
        print(sequenceDict)
ADD COMMENT
0
Entering edit mode
8.6 years ago

You want Frog virus 3 (isolate Goorha) (FV-3) to be the key in your dictionary and the rest the value? Assuming that and assuming that your file is not too big (and as such can be comfortably kept in memory) I have some code you could try...

import sys

def makedict(datalist):
    data = ' '.join(datalist),replace('[','%@%@').replace(']', '%@%@) 
    info = data.split('%@%@')
    return((info[1], [info[0], info[2]]))

with open(sys.argv[1]) as input:
    outdict = {}
    content = [line.strip() for line in input.readlines() if not line == ""]
    tempdata = []
    for line in content:
         if line.startswith('>'):
              if not len(tempdata) = 0:
                   key, value = makedict(tempdata)
                   outdict[key] = value
             else:
                  tempdata = [line,]
        else:
            tempdata.append(line)
   else:
        key. value = makedict(tempdata)
        outdict[key] = value

Notice:

-the list comprehension to properly format the input

-the else clause on the for loop to also convert the last entry

-storing the objects in the templist and emptying this after creating a dict key and value based on it

-in makedict function I replace brackets by something we do not expect in the file at all, which I use subsequently for splitting allowing me to isolate the species name.

  • the function makedict returns a tuple with first the key and second element the rest of the info

Since I do not have your (complete) inputfile I haven't been able to test it, so let me know if something goes wrong which you can't fix.

ADD COMMENT

Login before adding your answer.

Traffic: 2577 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6