sum counts in regions in bed format
0
0
Entering edit mode
4.1 years ago
qwzhang0601 ▴ 80

I have processed the DNA methylation data and generate the file showing read count that call each CpG site as methylated or unmethylated. Now I want to sum such read counts in sliding windows. Is there some tools can do this easily? Briefly, for each window I will need to save the number of CpG sites covered, the sum of methyCount and unMethyCount.

#the file I generated is in the format as below
ChrID position methyCount unMethyCount
1       13823   0       1
1       13828   1       0
1       529822  2       0

#expected format for each window
Window      #CpG_covered      sum_methyCount sum_unMethyCount   
....

Thanks

sequencing • 788 views
ADD COMMENT
0
Entering edit mode

Hi,

If I correctly understood your query, one way I can think of:

Create you desired sliding window of genome (bedtools makeWindows). eg. 20 Kb may be. => File1.txt,

Your bed file of CpGs => File2.txt

Perform bedtools intersect File1.txt File2.txt. and concatenate the coordinates of CpGs by special character so that it become one string : chr1:1234:1234

Run aggregate function in R (see online examples)

ADD REPLY
0
Entering edit mode

What have you tried on your own?

ADD REPLY

Login before adding your answer.

Traffic: 1631 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6