Identify Expressed Genes From Combined Microarray Data Sets
1
1
Entering edit mode
13.1 years ago
Jessica ▴ 70

Hi all,

I have to combine two datasets obtained using two different platforms (Illumina and Affymetrix). The combined dataset contains gene expression for 11 cell types. For my purpose, I do not need to find the differentially expressed genes of one cell type to the others, but I need to find the upregulated genes of each cell type. To do this, I ranked ~20000 genes for each sample, and selected genes that were ranked within the top 20% of the ~20000 genes for 80% of the replicates of each cell type (all the cell types have >=5 replicates). However, I am not sure how to estimate the statistical significance (e.g., FDR) for my selected genes. Any advice is appreciated. Also, does anybody know any methods that suit my purpose?

Thank you very much.

Wendy

microarray data • 3.8k views
ADD COMMENT
0
Entering edit mode

up-regulated relative to what?

ADD REPLY
0
Entering edit mode

Maybe you can use Combat within the SVA package on bioconductor (http://www.bioconductor.org/packages/release/bioc/html/sva.html). It can help to merge data sets from different batches with different conditions and it also contain functions for p-value calculation. The problem is, you might find it difficult to map the probe ids to generate the required data structure

ADD REPLY
4
Entering edit mode
13.1 years ago

Gene expression measurements on a microarray are not absolute (that is, a gene that has a high expression value may or may not have more RNA in cell than another gene with a lower expression value), so ranking genes by their expression measures is not something that makes much sense. Also, I would not be surprised if the top-ranked genes by your described method are quite overlapping between cell types.

Without knowing your biologic question, it is hard to tell you what to do, but I'd suggest that what you look for cell-type-specific genes. For that, one can use typically hypothesis testing methods across samples; with multiple classes (cell types), this is often done using an F-statistic. The two-platform thing limits what can be done, but I think in the end across-sample, within-gene hypothesis testing is another more established way to go.

ADD COMMENT
0
Entering edit mode

Hi Sean, is it possible to do a Wilcoxon Signed-Rank test for the expressed and non-expressed genes and apply the correction for multiple tests across cellLines?

ADD REPLY
0
Entering edit mode

Hi Sean, I am trying to find the expressed ligand and receptor genes by each cell type out of a ligand/receptor database. The selected ligand and receptor genes do not have to be cell type specific. Given this aim, do you have any further comments? Would you mind to elaborate more about across-sample, within-gene hypothesis testing? What sort of statistical methods should I look into? Thank you.

ADD REPLY
0
Entering edit mode

Hi Sean, I am wondering for gene expression measurement, why a gene with high expression value may not have more RNA?

ADD REPLY
0
Entering edit mode

For a given probe or probeset, characteristics like binding affinity, cross-hybridization with non-target molecules, potentially mRNA secondary structure, and many other factors may affect the operating characteristics of the probe. Unfortunately, those effects are not identical for all different probes on the array, making comparisons between probes problematic. See this figure, for example:

http://nar.oxfordjournals.org/content/39/suppl_1/D1011/F1.expansion.html

ADD REPLY
0
Entering edit mode

Note that there are methods for determining which genes are expressed in a sample. http://nar.oxfordjournals.org/content/39/suppl_1/D1011.full.

ADD REPLY
0
Entering edit mode

Thanks a lot Sean.

ADD REPLY

Login before adding your answer.

Traffic: 1692 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6