I have huge set of microarray set (360 samples) run on a number of chips. Ian trying to analyze them. Currently I am using the bioconductor lumi package. I have run smaller samples before.
I want to make sure the data is good quality and if running all together is good. Can someone point to a tutorial or things (quality control points) to check to make sure its good?
I don't recall if lumi offers all of the same normalization methods as GenomeStudio.
It's been a little while, but these are the QC metrics that I remember for Illumina expression arrays:
Compare sample signal distributions (if I recall, even 'quantile normalization' in GenomeStudio wasn't only quantile normalization because there were still differences) and look for outliers
Look for outliers using PCA, hierarchical clustering, etc.
Compare clustering with different normalization methods. More specifically, compare how your groups of interest cluster under different conditions.
If I recall, I think background subtraction was important and I liked to see how the results differed for 'no normalization' versus 'quantile normalization'. I think I always skipped the imputation step.
Also, I always defined each sample as it's own group (in genome studio, not for statistical analysis). I seem to remember this being a bigger problem for the methylation arrays than the expression arrays, but I still think it is important. You can figure this out on your own with enough permutations of #3, but I didn't want to do this each time I had a dataset to analyze.
Depending upon your study design, you may also need to apply a batch correction.
The package is based on empirical Bayes statistics (eBayes()) and linear model fitting.
Are all the 360 from the same platform? if so normalization should be easier, as long as you have enough computer power to process them you should be fine with limma. it has a number of quality assessment tools to allow you to process and check post-normalization if you are all set for the analysis.
Also, how many conditions do you have? limma allows you to handle as many as you wish as long as you create the correct design matrix (basically to tell R which samples are which)
Hope this helps :)
ADD COMMENT
• link
updated 2.8 years ago by
Ram
44k
•
written 10.0 years ago by
TriS
★
4.7k
0
Entering edit mode
Thanks a lot :)
I still have trouble making plots in R for these 360 samples. See above. Any help/suggestion would be great!
ADD REPLY
• link
updated 2.8 years ago by
Ram
44k
•
written 10.0 years ago by
datanerd
▴
520
Thanks a lot :)
I still have trouble making plots in R for these 360 samples. See above. Any help/suggestion would be great!