How an element is accessed in a python list
1
0
Entering edit mode
3.7 years ago

Hi,

I am self-teaching myself bioinformatics. In one of the books I am reading, I came across the following code written in Python language. My Python knowledge is only at the beginner level.

from operator import itemgetter
input_file = open("PDBhaemoglobinReport.csv")
output_file = open("PDBhaemoglobinSorted.csv", "w")

table = []
header = input_file.readline()
for line in input_file:
    col = line.split(',')   
    col[3] = float(col[3][1:-1])
    col[4] = int(col[4][1:-2])

    table.append(col)

table_sorted = sorted(table, key = itemgetter(3,4))

output_file.write(header + '\n')
for row in table_sorted:
    row = [str(x) for x in row]
    output_file.write('\t'.join(row) + '\n')
output_file.close()

The following are the first three lines of the file from which the data is read.

PDB ID,Chain ID,Exp. Method,Resolution,Chain Length
"1A4F","A","X-RAY DIFFRACTION","2.00","141"
"1C7C","A","X-RAY DIFFRACTION","1.80","283"

I am completely confused with the following two lines

col[3] = float(col[3][1:-1])
 col[4] = int(col[4][1:-2])

When I tried col[3] = float(col[3]) or col[4] = int(col[4]) the script throws an error. For example col[1:-1] corresponds to ['"A"', '"X-RAY DIFFRACTION"', '"2.00"']. this list doesn't have a third element, so I am not sure how float(col[3][1:-1]) works.

Thanks

Python • 864 views
ADD COMMENT
2
Entering edit mode
3.7 years ago

To troubleshoot problems like this you need a proper debugging strategy. First and foremost if you don't know what

col[3][1:-1]

actually is then you cannot fix the problem. So focus on understanding your objects.

Unpack your data into separate steps, investigate each element, understand what each object is with type and dir commands. Replace:

col[3] = float(col[3][1:-1])

with just

obj = col[3]

print(type(obj))
print(dir(obj))
sys.exit()

then do the same for obj=col[3][1:-1] and so on. Understand your objects before you use them. That way you will understand what each object actually is.

In this case the error comes from trying to make a float out of a slice object (a list), that cannot work:

x=[1,2,3]
print(type(x[1:-1]))
<class 'list'>
ADD COMMENT
0
Entering edit mode

Thank you so much for the lead. As per your suggestion, I have tried the following:

obj = col[3]
print(type(obj))
print(obj)
obj = col[3][1:-1]
print(type(obj))
print(obj)

The output was:

<class 'str'>
"2.00"
<class 'str'>
2.00

So, in both cases, it was a string object. Somehow this line, obj = col[3][1:-1], strips off the double-quotes.

ADD REPLY
0
Entering edit mode

if you want to strip characters, use the strip method of the string:

>>> val = '"2.0"'
>>> float(val.strip('"'))
2.0

the above method will work for both quoted and non-quoted characters, whereas the method you have will silently alter (and mess up) your data when there are no quotes.

In this particular case your problem is most likely that you are manually splitting on commas in a quoted CSV file.

for line in open("filename"):
    elems = line.split(",")

Hence you have to deal with stripping quotes. This is not a good solution.

Instead use the csv module to read CSV files, then no stripping is necessary:

stream = open("filename")
for elems in csv.reader(stream):
     print (elems)

the code is simpler and will work properly

ADD REPLY
0
Entering edit mode

Thank you so much. What you have suggested is a better solution and clarified my confusion!

ADD REPLY

Login before adding your answer.

Traffic: 2746 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6