Split Data From A Csv Cell Into Smaller Cells
1
0
Entering edit mode
13.3 years ago
RossCampbell ▴ 140

I have a csv file containing matrices of nucleotide frequencies for each position in an alignment. Essentially, I have a PWM saved as a csv file. Unfortunately, my script saved each set of frequencies in one cell. For example, in cell B1, I have {A: 0.25, C: 0.25, G: 0.25, T: 0.25}. What I need is for each cell to be more like: B1; {A:0.25} B2; {C:0.25} B3; {G:0.25} B4; {T:0.25}. Is there a way to split each frequency into its own cell like this?

The python code that I used to write to the csv:

alphabet = IUPAC.unambiguous_dna
m = Motif.Motif(alphabet)
writer = csv.writer(open('filename.csv', 'wb', buffering=0))
for seq_record in SeqIO.parse("filename.fasta", "fasta", alphabet=alphabet):
    m.add_instance(seq_record.seq)
    PWM = m.pwm()
    writer.writerows([[seq_record.id)],(PWM)])
format parsing python • 5.2k views
ADD COMMENT
4
Entering edit mode

Could it be that you open the resulting csv file in Excel but don't specify comma as the delimiter?- otherwise "A: 0.25, C: 0.25, G: 0.25, T: 0.25" should be split in different cells, as there is a comma there.

ADD REPLY
1
Entering edit mode

you can add an extra step here. when you open the file in excel, select all the data, then in the "data" tab, click on "convert text to columns" and chose "comma" as delimiter. This should split the data in 4 different cells.

ADD REPLY
0
Entering edit mode

Great idea. I'll try that and see what happens.

ADD REPLY
0
Entering edit mode

Oddly enough, specifying a comma as the delimiter didn't affect the format. Changing it to ' ' or '}' (both commonly occurring in my file) did though. I'm not really sure why the comma didn't.

ADD REPLY
1
Entering edit mode
13.3 years ago

What you will need to do is to format the data structure that you get out from the pwm() method. You should also use the writerow() method rather than the writerows() one. Finally you might want to impose an order rather than relying on a default dictionary order.

Your code then will look approximately like this (I don't have a similar code ready so I can't test it):

row = [ sequenceid ]
for letter in "ATGC":
    row.append( pwmdata[letter] )
writer.writerow( row )

where pwmdata is a dictionary like object that holds your frequencies.

ADD COMMENT

Login before adding your answer.

Traffic: 2454 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6