Entering edit mode
8.6 years ago
SheelS
▴
40
Hi there, I have a csv file looks like below, and I would like to convert the genotypes (A or B) to genotype nucleotide (A, T, G, C).
Marker, alleleA, alleleB, id1, id1, id2, id2, # id is people
rs1, G, T, A, A, A, B,
rs2, A, G, B, A, A, A,
rs3, C, T, 0, 0, A, B, # 0 is missing
After converting.....
Marker, alleleA, alleleB, id1, id1, id2, id2,
rs1, G, T, G, G, G, T, # for rs1, the 'A' is replaced by G, and 'B' by T
rs2, A, G, G, A, A, A,
rs3, C, T, 0, 0, C, T,
Does anyone know any smart way to do this ?
Thanks for any possible solution or advice in advance !
Thanks, but sorry it looks like it does not work,
IndexError: list out of range alleleA = [1]
Oh crap I made a mistake, will edit. Delimiter = ','
Sorry, but issue still there. Any ideas?
This should do the trick now. My bad, did some bad pseudocode in the above almost functional script above. This script definitely isn't the most efficient if you have millions of lines, but it should do the job. Warning: python2.7 synthax. I could write it in a few lines shorter, but readability counts and explicit is better than implicit, ya know ;-)
I wonder where the trailing comma comes from in your code example (at the end of the line). Correct me if I'm wrong, but is that common for a csv file? To easily remove it:
Sorry but still the same issue,
list out of range
would you mind I take your code to ask in somewhere? No offend, then we can know how to solve it ! : )Well, the code works here. But you can do whatever you want with my code :)