Qc Of Illumina450K Data From Geo With Lumi
1
2
Entering edit mode
11.2 years ago
sea.array ▴ 20

Dear all,

I am planning to work with several methylation Illumina 450K datasets from the GEO Expression Omnibus database. I want to use any R package available to do normalization and other QC steps as well as to remove batch effects. I’ve tried using lumi, methylumi and minfi.

The problem is that I am getting errors when reading the files available at GEO with lumi/methylumi/minfi. If I understand it OK, the infile for these packages is the outfile of GenomeStudio (the Final Report). However, this file or original idat files are not in GEO.

The files in GEO Database are: 1- one matrix with beta values for all individuals (series matrix) 2- one file with methylated and unmethylated probe signal intensities (in some cases p-values too) 3- RAW data containing: manifest_header_descriptions, csv, bpm files

My questions are: 1- How can I convert the files in GEO to generate the input file for lumi/methylumi/minfi to do QC steps? Any preferences for packages? 2- In case I have to parse the input file myself, where can I found find a template of GenomeStudio outfile (including COLOR_CHANNEL column)? 3- How can I combine different GEO datasets to perform joint QC assessment?

Thank your for your help!

methylation geo normalization • 4.7k views
ADD COMMENT
0
Entering edit mode
10.7 years ago

I've only tested those programs using .idat files as inputs. ArrayExpress sometimes provides the .idat files (and I think you can also get the .idat files for TCGA samples, which won't be in GEO), but this will be an issue with getting data from GEO.

COHCAP can provide QC stats from the FinalReport file and you could specify the batches as pairing IDs to correct for batch effects for the statistical analysis (as a 2-way ANOVA, for example), but it doesn't really do other types of normalization (although, in my opinion, I think the background correction and other normalization techniques within Genome Studio should be sufficient).

You can either run COHCAP as a standalone program or as a Bioconductor package:

http://sourceforge.net/projects/cohcap/

http://bioconductor.org/packages/devel/bioc/html/COHCAP.html

I have a protocol exchange listing specifically for using COHCAP to analyze 450k data using the Bioconductor package:

http://www.nature.com/protocolexchange/protocols/2965

The only difference is that you'll want to skip the .idat processing instructions and using the FinalReport.txt file in place of the "minfi.txt" file. I believe this should provide the instructions on how to get the FinalReport.txt file in the right format (except you should not export the detection p-values for the Bioconductor package; that should only be used for the standalone version): https://docs.google.com/uc?id=0B1xpw6_kQMKuVm1kS3V6dlJBbmc&export=download&revid=0B1xpw6_kQMKuQ25yS0RDQ3JlYWNKTEk0THRGZmxQQjVGTU40PQ

The protocol exchange listing also provides some templates and simple benchmarks for some other tools (that can define differentially methylated regions). I feel like there probably should be some way to import your FinalReport.txt file into minfi, but I can't give you specific instructions right now. You can also see if RnBeads accepts the GenomeStudio output as an acceptable input format (if you aren't satisfied with COHCAP).

ADD COMMENT

Login before adding your answer.

Traffic: 814 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6