Calculate the average of two numbers in a line in python ?
2
0
Entering edit mode
8.6 years ago
Kevin_Smith ▴ 10

How can I calculate the average of column 1 and 2 and do the same for each line with python?

     1        2

chrX 153706121 153706381 10 chrX 153706065 153706547 26 260 chrX 153705996 153706564 64 260

chrX 153993742 153993999 10 chrX 153993486 153994032 16 257 chrX 153993524 153994054 51 257

chrY 11920805 11921481 34 chrY 11920737 11921423 46 618 chrY 11920001 11921739 148 676

chrY 12379363 12379922 33 chrY 12379100 12379843 31 480 chrY 12379092 12380100 49 559

Thank you very much!

software error sequence alignment gene ChIP-Seq • 8.5k views
ADD COMMENT
0
Entering edit mode

What I'm trying to do is calculate the average of columns 1 and 2 , 5 and 6 , 10 and 11 for each of the lines . Then use these averages for calculate the average of the three averages calculated and save the result in a bed file. This should be the center of the peaks or the region. So far I have this code but is not working , I think that just calculate the averages for the first line and no all the lines in the file. What do you think that is wrong?

n= open('overlap_peaks.bed', 'r')

for line in n:

line = line.split("\t")
ave1 = (int(line[1]) + int(line[2])) / 2.0
ave2 = (int(line[5]) + int(line[6])) / 2.0
ave3 = (int(line[10]) + int(line[11])) / 2.0
average= (int(ave1) + int(ave2) + int(ave3)) / 3.0

Thanks!

ADD REPLY
1
Entering edit mode

I guess this isn't the 'complete' code? Are you using python interactively? What's going wrong?

ADD REPLY
0
Entering edit mode

For example if I do print average , it give me just one number. The program should give a list of all the average calculated for each line.

Also if I do average.saveas('center.bed') give me the error 'float' object has no attribute 'saveas'.

ADD REPLY
1
Entering edit mode

Okay, I'll take it you don't have a lot of experience with python then. You may want to adapt something like this:

infile= open('overlap_peaks.bed', 'r')
outfile = open('center.bed', 'w')
for line in infile:
    line = line.split("\t")
    ave1 = (int(line[1]) + int(line[2])) / 2.0
    ave2 = (int(line[5]) + int(line[6])) / 2.0
    ave3 = (int(line[10]) + int(line[11])) / 2.0
    average= (int(ave1) + int(ave2) + int(ave3)) / 3.0
    outfile.write(average)

And then let the loop execute. But more commonly, you would put this in a script (and not use python interactively). I suggest you first try to get yourself comfortable with python before using it for your research, otherwise mistakes will be made.

ADD REPLY
0
Entering edit mode

Yeah, I'm new with python. I'm having a hard time with it. I just tried your code but it give me:

outfile.write(average) expected a string or other character buffer object

ADD REPLY
0
Entering edit mode

Oh damn, my bad, should have used

outfile.write(str(average) + "\n")

A resource I found particularily useful to get started with python is Codecademy: https://www.codecademy.com/ If you work yourself through this interactive introduction things will get easier.

(You'll never stop making mistakes for the rest of your coding life, but they'll become easier to understand and solve.)

ADD REPLY
0
Entering edit mode

Thank you very much. Now it works !!!

ADD REPLY
1
Entering edit mode
8.6 years ago
line = line.split("\t")
ave = (int(line[1]) + int(line[2])) / 2.0
ADD COMMENT
1
Entering edit mode

Column 1 and column 2 would be

line = line.split("\t")
ave = (int(line[0]) + int(line[1])) / 2.0

Unless OP already took counting from 0 into account...

ADD REPLY
1
Entering edit mode

Yeah, it was a bit ambiguous to me if OP literally wanted the first two columns. From the example, I assumed the average of the positions (i.e., the center of the region).

ADD REPLY
1
Entering edit mode

Would make sense, although the table looks strange to me, guess OP can now decide for himself what's appropriate.

ADD REPLY
0
Entering edit mode
8.6 years ago
wpwupingwp ▴ 120

if your file is too big, maybe you can try awk

awk -F "\t" '{print ($2+$3)/2}' filename > output_file

ADD COMMENT
0
Entering edit mode

OP also wants averages for columns 1 and 2 , 5 and 6 , 10 and 11 (the genomic locations in his sample data). For this your code is easily adapted, of course. @OP: keep in mind that python counts from 0, awk from 1.

ADD REPLY
0
Entering edit mode

So this approach do you think that is good for finding the average center of the peaks in the bed file?

ADD REPLY
0
Entering edit mode

Sure.

ADD REPLY

Login before adding your answer.

Traffic: 1074 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6