Entering edit mode
10.7 years ago
devolver1
▴
20
I'd like to use a TCGA methylation dataset but some of the data is from the 27k Illumina platform and some from the 450k. Is there a simple script to take only the common features? I read about the type I and type II probes in the 450k array and how to correct the type II probes but that doesn't address if it's even possible to take the common probes on the 27k and 450k platforms. Thank you.
I'd be tempted to use only the 450k since it includes 90% of the 27k probes and has far greater coverage. Unless there is some reason for using both, e.g. to compare the platforms?
There are only so many subjects that TCGA has for any particular tumor type. So if I stick to only the 450 array then I get to use only half of the subject data and for machine learning purposes that is a big loss. If I take the common probes for all subjects, 90 % common doesn't sound too bad. Thank you for the response.
What correction did you performed on type II probes? I'm using the same data but I assumed that being them level 3 data, no other correction was needed. I wrote a simple script to extract common features but after reading your post I started wondering if I'm doing something wrong.