Qc And Preprocessing Gene Expression Microarrays - When Is A "Data Set" A Single Data Set?
1
0
Entering edit mode
13.0 years ago
Evansa • 0

Hello everyone,

I am new to microarray analysis. I am attempting to do basic QC for gene expression data obtained from Affymetrix Hg-U133a chips, before exploring different methods of normalization.

I been working through various tutorials online and have been looking at output from Bioconductor's simpleaffy package qc function, and affyPLm's fitPLM function.

My data set contains some poor quality image files, raw probe intensity density plots reveal a number of outliers, and a number of Cel files fail QC metrics such as 3'/5' beta actin and GAPDH ratios, etc.

I wish to exclude such files from downstream analysis.

However, from Gentleman, R. et al.2005. Bioinformatics and Computational Biology Solutions Using R and Bioconductor. Springer. and Gentleman, R. 2007. Some Quality methods for affymetrix microarrays. http://www.bioconductor.org/help/course-materials/2007/biocadv/Labs/AffyQuality/AffyQuality.pdf I see that RNA degradation plots and NUSE values are not comparable across data sets. I guess this applies to other metrics.

How loosely can "data set" be applied?

My "data set" consists of 150 CEL files, from independent samples, processed and run on numerous dates over the course of a several years, from a single study.

Is it valid to perform these QC procedures treating my CEL files as a single data set? If not, are there any valid methods of performing QC in such situations?

Thanks in advance for any assistance.

microarray qc gene bioconductor • 4.4k views
ADD COMMENT
0
Entering edit mode
13.0 years ago
boczniak767 ▴ 870

Hi Evansa,

I think "dataset" (also in view of Gentelman's article) is simply collection of data obtained in a given experiment. I.e. from arrays which was hybridized in short period of time (i.e. using the same packages of reagents, and the same conditions (everything which can be imagined to change in next experiment)).

I'd advise you to remove outliers.[?] You can also check if your data display batch effect.

HTH Maciej

ADD COMMENT
0
Entering edit mode

Hi Maciej,

Thank you for your reply.

This is as I had feared. The arrays were processed over several years and forty runs.

Are there QC strategies that one can apply to such data in order to remove outliers, as NUSE plots and density histograms can't validly be applied to my data as a whole?

I was intending to look for batch effects downstream of QC and normalization, but am not sure if analysis of these CEL files is feasible, or if it is something that should be attempted?

I am doubtful, but hopeful. I would be grateful for advice, even if that advice is that analysis is not achievable

ADD REPLY
0
Entering edit mode

If the data looks good within runs (which probably corresponds to replications) you can try to specify batch as as blocking factor in your analysis.

ADD REPLY

Login before adding your answer.

Traffic: 2570 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6