Entering edit mode
4.1 years ago
qwzhang0601
▴
80
I have processed the DNA methylation data and generate the file showing read count that call each CpG site as methylated or unmethylated. Now I want to sum such read counts in sliding windows. Is there some tools can do this easily? Briefly, for each window I will need to save the number of CpG sites covered, the sum of methyCount and unMethyCount.
#the file I generated is in the format as below
ChrID position methyCount unMethyCount
1 13823 0 1
1 13828 1 0
1 529822 2 0
#expected format for each window
Window #CpG_covered sum_methyCount sum_unMethyCount
....
Thanks
Hi,
If I correctly understood your query, one way I can think of:
Create you desired sliding window of genome (bedtools makeWindows). eg. 20 Kb may be. => File1.txt,
Your bed file of CpGs => File2.txt
Perform bedtools intersect File1.txt File2.txt. and concatenate the coordinates of CpGs by special character so that it become one string : chr1:1234:1234
Run aggregate function in R (see online examples)
What have you tried on your own?