Question

Split Data From A Csv Cell Into Smaller Cells

0

Entering edit mode

13.3 years ago

RossCampbell ▴ 140

I have a csv file containing matrices of nucleotide frequencies for each position in an alignment. Essentially, I have a PWM saved as a csv file. Unfortunately, my script saved each set of frequencies in one cell. For example, in cell B1, I have {A: 0.25, C: 0.25, G: 0.25, T: 0.25}. What I need is for each cell to be more like: B1; {A:0.25} B2; {C:0.25} B3; {G:0.25} B4; {T:0.25}. Is there a way to split each frequency into its own cell like this?

The python code that I used to write to the csv:

alphabet = IUPAC.unambiguous_dna
m = Motif.Motif(alphabet)
writer = csv.writer(open('filename.csv', 'wb', buffering=0))
for seq_record in SeqIO.parse("filename.fasta", "fasta", alphabet=alphabet):
    m.add_instance(seq_record.seq)
    PWM = m.pwm()
    writer.writerows([[seq_record.id)],(PWM)])

format parsing python • 5.2k views

ADD COMMENT • link updated 13.3 years ago by Istvan Albert 101k • written 13.3 years ago by RossCampbell ▴ 140

4

Entering edit mode

Could it be that you open the resulting csv file in Excel but don't specify comma as the delimiter?- otherwise "A: 0.25, C: 0.25, G: 0.25, T: 0.25" should be split in different cells, as there is a comma there.

ADD REPLY • link 13.3 years ago by Michael Schubert ★ 7.1k

1

Entering edit mode

you can add an extra step here. when you open the file in excel, select all the data, then in the "data" tab, click on "convert text to columns" and chose "comma" as delimiter. This should split the data in 4 different cells.

ADD REPLY • link 13.3 years ago by Gjain 5.8k

0

Entering edit mode

Great idea. I'll try that and see what happens.

ADD REPLY • link 13.3 years ago by RossCampbell ▴ 140

0

Entering edit mode

Oddly enough, specifying a comma as the delimiter didn't affect the format. Changing it to ' ' or '}' (both commonly occurring in my file) did though. I'm not really sure why the comma didn't.

ADD REPLY • link 13.3 years ago by RossCampbell ▴ 140

score 1 · Answer 1 · 2011-08-18

What you will need to do is to format the data structure that you get out from the pwm() method. You should also use the writerow() method rather than the writerows() one. Finally you might want to impose an order rather than relying on a default dictionary order.

Your code then will look approximately like this (I don't have a similar code ready so I can't test it):

row = [ sequenceid ]
for letter in "ATGC":
    row.append( pwmdata[letter] )
writer.writerow( row )

where pwmdata is a dictionary like object that holds your frequencies.