Question

Is there an informative subset of 450k methylation probes?

1

Entering edit mode

9.7 years ago

rmccloskey ▴ 240

I'm working on an analysis which includes 450k methylation data. There are so many probes that analysing the whole data set is becoming a problem in terms of time and memory. I'm sure that nearby methylation sites are highly correlated, so is there some kind of informative subset of the whole probset I could use, to reduce computational costs without losing too much information? I'm aware that it's possible to do this myself using clustering or something, but I was hoping it had been done already.

methylation 450k illumina • 2.3k views

ADD COMMENT • link updated 2.5 years ago by Ram 44k • written 9.7 years ago by rmccloskey ▴ 240

Ram · Answer 1 · 2015-03-30

0

Entering edit mode

9.7 years ago

Charles Warden 8.3k

While the correlation of nearby CpG sites is an assumption made in the probe design, I think it is best to take advantage of as much information as possible.

I (and others) have done some work on trying to define differentially methylated regions from 450k data. I have some templates for analysis for a couple programs here:

How many samples do you need to analyze? For small cell line datasets, I think the above tools should be OK for most desktops (but I agree that large patient cohorts may need to be run on a more powerful Linux cluster).

ADD COMMENT • link updated 2.5 years ago by Ram 44k • written 9.7 years ago by Charles Warden 8.3k

0

Entering edit mode

I have 389 samples. I agree that it's best to use as much information as possible, but I'm already running on a cluster and am still having memory issues.

ADD REPLY • link 9.7 years ago by rmccloskey ▴ 240