<Data Munging> Convert Methylation Percentage to Number of Reads
1
0
Entering edit mode
9.2 years ago
wanziyi89 ▴ 60

Dear all, I have a file as shown below:

chrBase        chr       base    strand    coverage    freqC    freqT
chrLG1.6955    chrLG1    6955       R       12         0.00     100.00

I would like to print this as a new file with the following format :

Chr"\t"Position"\t"Coverage"\t"Numbers_of_reads_in_T

Which will give me

chrLG1 6955 12 12

Usually I will just do

awk'{print$1"/t"$2} input.txt > output.txt

but this time it involves some mathematical conversions. Can anyone here guide me on how to do this conversion in UNIX?

Many thanks in advance and I wish you a good Monday!

conversion methylation • 1.7k views
ADD COMMENT
3
Entering edit mode
9.2 years ago

awk doesn't have a round() function, at least not the last time I checked, so it'd be easier in python or perl. I hate perl (though it'd be shorter here!), so here's a couple lines of python:

#!/usr/bin/env python
import sys
import csv

for line in csv.reader(open(sys.argv[1], "r")) :
    frac = float(line[6])
    cov = int(line[4])
    nT = round(cov*frac/100)
    nC = cov-nT
    print("%s\t%s\t%i\t%i" % (line[1], line[2], cov, nT))

Note that the number of digits shown limits how precise this method can be, but there's no way around that.

Edit: I shouldn't code before my first cup of coffee. My original reply mixed 3 programming languages!

ADD COMMENT
2
Entering edit mode

Awk has rounding through using printf in place of print, but there's a gotcha for the unweary ... Awk uses C's sprintf function, which by default does 'even' rounding on tie breaks (rounding towards the nearest even number) ...

awk 'BEGIN {printf("%.0f\n", 0.5)}'
0
awk 'BEGIN {printf("%.0f\n", 1.5)}'
2

Depending on your C libraries, it's configurable (see here), but in my experience, few systems support this.

However, assuming the calculated proportion must basically fall back to an integer number of reads +- float arithmetic error, this shouldn't make any difference here. So...

tail -n+2 <yourfile> | awk '{printf("%s\t%s\t%s\t%.0f\n", $2, $3, $5, $5*$7/100)}
ADD REPLY
0
Entering edit mode

Good to know!

ADD REPLY

Login before adding your answer.

Traffic: 1758 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6