Question

Define methylation level within X nucleotides

0

Entering edit mode

8.9 years ago

thefirstrealace ▴ 30

Hello everybody,

I have a bed file containing methylation levels at certain coordinates which has been generated from BS-Seq data of human spleen cells. Here is a small part of its content:

chr1    10468    id-20250951    0.773585
chr1    10469    id-20250952    0.773585
chr1    10470    id-20250953    0.750000
chr1    10471    id-20250954    0.750000
chr1    10483    id-20250955    0.918033
chr1    10484    id-20250956    0.918033
chr1    10488    id-20250957    0.830769
chr1    10489    id-20250958    0.830769
chr1    10492    id-20250959    0.805556
chr1    10493    id-20250960    0.805556
chr1    10496    id-20250961    0.896104
chr1    10497    id-20250962    0.896104

I need to calculate the methylation levels within 20 nucleotide bins along a certain part of the genome. Lets consider the first six entries (coordinate 1068 - 1084) for our first 20 nt bin: How is the methylation level defined? Do I have to sum up the first 6 methylation values and divide by 20 or by 6?

I also heard from a friend, that it might be defined as the sum of the first six methylation values divided by the total number of Cytosines within the 20 nt bin.

Can someone please shed light on this?

Best regards

methylation-level bed digitize • 1.7k views

ADD COMMENT • link updated 2.3 years ago by Ram 44k • written 8.9 years ago by thefirstrealace ▴ 30

Ram · Answer 1 · 2015-12-26

That's not a BED file, it's some custom format. You can likely make a bedGraph file out of it with awk '{OFS="\t"; $3 = $2+1; print $0}' input.bed > output.bed
Don't listen to your friend, he/she is wrong.

With a bedGraph file, you can make a bigWig file and then use either bigWigSummary or pyBigWig (if you prefer scripting in python). Either of these can directly output the average methylation of 20 base adjacent bins in some region.

For what it's worth, the average methylation is the sum divided by the number of entries present in a region. If one were to include positions for which there's no entry then one would be saying that such positions are 0% methylated. This would obviously be a terrible idea.