450k methylation array - 2 raw intensity values for same probe
1
0
Entering edit mode
5.2 years ago
neuro3030 ▴ 50

From GEO omnibus I have download raw intensities from a 450k methylation array (in a tab-delimited text format). However, there are duplicates of some of the probes:

example for Sample 1:

There are two rows for this probe, one for Address A and one for Address B:

  1. cg00000622 (Address = 11642304)
  2. cg00000622 (Address = 38691301)

However, each has a methylated and an unmethylated intensity

  1. methylated = 907 unmethylated = 10835
  2. methylated = 120 unmethylated = 735

How do I get this down to single methylated and unmethylated values for this probe? (so that I may calculate the beta value).

Is this a type I probe or is one of them a control probe?

Thanks for any help

450k methylation array illumina • 1.8k views
ADD COMMENT
0
Entering edit mode

This is a two-color array. You should get a background towards that platform first before diving into analysis, e.g. https://www.bioconductor.org/help/course-materials/2015/BioC2015/methylation450k.html

ADD REPLY
0
Entering edit mode

Thank you. But how does one arrive at a single intensity value for methylated and a single intensity value for unmethylated, for type I probes? For type II probes, the methylated is in the red channel, unmethylated in green. For type I, am I to interpret the red ("methylated") channel as having both a methylated and unmethylated signals? It's a bit confusing.

If red and green do not equate to methylated and unmethylated for type I probes, how does one calculate the beta value from this? Do you add both methylated signals = M, and add both unmethylated signals = U?

ADD REPLY
0
Entering edit mode
5.2 years ago

I wouldn't normally calculate this myself.

If you have access to the .idat files, you can use GenomeStudio and/or minfi to test various normalizations when calculating beta values.

In GEO, I thought you were required to provide both intensity and beta values, but that does make testing alternative processing more difficult.

ADD COMMENT
0
Entering edit mode

Thanks. However, I do have a specific problem:

The raw intensity values I downloaded from GEO contain the above format. All raw intensities I have downloaded from GEO from the 450k platform thus far contain the usual 485,577 probes. However, this particular data-set contains 617,984 probe intensities, due to listing of both intensity values for type I probes: methylated and unmethylated values for each red and green channels.

This causes an error of "duplicate" rows when import the data into minfi. How do I fix this?

Thanks

ADD REPLY
0
Entering edit mode

I apologize for the delayed response.

I changed the formatting to try and make myself a little more clear.

Time-permitting, I will see if I can understand what you are describing better by looking at another data set.

So, I am not 100% certain what to suggest, but I am surprised that beta values were not already provided (and I think the best solution is if you can re-process the raw .idat files).

ADD REPLY
0
Entering edit mode

Maybe there is something that I am not taking into consideration, but I just checked the intensity values for this dataset, and there are 485,577 rows.

ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE42nnn/GSE42308/suppl/GSE42308_HY_450k_signal_intensity.txt.gz

So, I am not sure if there are multiple valid ways to submit intensity value to GEO, but that is different than the data that I have uploaded.

ADD REPLY

Login before adding your answer.

Traffic: 1778 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6