I have a csv file containing matrices of nucleotide frequencies for each position in an alignment. Essentially, I have a PWM saved as a csv file. Unfortunately, my script saved each set of frequencies in one cell. For example, in cell B1, I have {A: 0.25, C: 0.25, G: 0.25, T: 0.25}. What I need is for each cell to be more like: B1; {A:0.25} B2; {C:0.25} B3; {G:0.25} B4; {T:0.25}. Is there a way to split each frequency into its own cell like this?
The python code that I used to write to the csv:
alphabet = IUPAC.unambiguous_dna
m = Motif.Motif(alphabet)
writer = csv.writer(open('filename.csv', 'wb', buffering=0))
for seq_record in SeqIO.parse("filename.fasta", "fasta", alphabet=alphabet):
m.add_instance(seq_record.seq)
PWM = m.pwm()
writer.writerows([[seq_record.id)],(PWM)])
Could it be that you open the resulting csv file in Excel but don't specify comma as the delimiter?- otherwise "A: 0.25, C: 0.25, G: 0.25, T: 0.25" should be split in different cells, as there is a comma there.
you can add an extra step here. when you open the file in excel, select all the data, then in the "data" tab, click on "convert text to columns" and chose "comma" as delimiter. This should split the data in 4 different cells.
Great idea. I'll try that and see what happens.
Oddly enough, specifying a comma as the delimiter didn't affect the format. Changing it to ' ' or '}' (both commonly occurring in my file) did though. I'm not really sure why the comma didn't.