Hi,
I am self-teaching myself bioinformatics. In one of the books I am reading, I came across the following code written in Python language. My Python knowledge is only at the beginner level.
from operator import itemgetter
input_file = open("PDBhaemoglobinReport.csv")
output_file = open("PDBhaemoglobinSorted.csv", "w")
table = []
header = input_file.readline()
for line in input_file:
col = line.split(',')
col[3] = float(col[3][1:-1])
col[4] = int(col[4][1:-2])
table.append(col)
table_sorted = sorted(table, key = itemgetter(3,4))
output_file.write(header + '\n')
for row in table_sorted:
row = [str(x) for x in row]
output_file.write('\t'.join(row) + '\n')
output_file.close()
The following are the first three lines of the file from which the data is read.
PDB ID,Chain ID,Exp. Method,Resolution,Chain Length
"1A4F","A","X-RAY DIFFRACTION","2.00","141"
"1C7C","A","X-RAY DIFFRACTION","1.80","283"
I am completely confused with the following two lines
col[3] = float(col[3][1:-1])
col[4] = int(col[4][1:-2])
When I tried col[3] = float(col[3]) or col[4] = int(col[4]) the script throws an error. For example col[1:-1] corresponds to ['"A"', '"X-RAY DIFFRACTION"', '"2.00"']. this list doesn't have a third element, so I am not sure how float(col[3][1:-1]) works.
Thanks
Thank you so much for the lead. As per your suggestion, I have tried the following:
The output was:
So, in both cases, it was a string object. Somehow this line, obj = col[3][1:-1], strips off the double-quotes.
if you want to strip characters, use the
strip
method of the string:the above method will work for both quoted and non-quoted characters, whereas the method you have will silently alter (and mess up) your data when there are no quotes.
In this particular case your problem is most likely that you are manually splitting on commas in a quoted CSV file.
Hence you have to deal with stripping quotes. This is not a good solution.
Instead use the
csv
module to read CSV files, then no stripping is necessary:the code is simpler and will work properly
Thank you so much. What you have suggested is a better solution and clarified my confusion!