Question

Fold Change Detection In Affymetrix Microarray Data Without Replicates

2

Entering edit mode

12.2 years ago

elb ▴ 260

Dear users, I have a question regarding microarray data analysis (Affymetrix one color). My point is that I have just 1 sample TREATMENT and 1 sample REFERENCE. Neither technical replicates nor biological replicates are available. A statistical test to find differentially expressed genes between the two conditions seems to me impossible (even the simple t-test) due to the absence of replicates. People who asked me to do the analysis were interested only in finding the genes changing between the two conditions. In this conditions, in my opinion only the fold change is possible just to give a general view of the behavior of the genes. Any other suggestion about this issue?

Thanks a lot

Ele

microarray bioconductor r • 5.7k views

ADD COMMENT • link 12.2 years ago by elb ▴ 260

Istvan Albert · Answer 1 · 2012-09-10

4

Entering edit mode

12.2 years ago

Istvan Albert 101k

As Will put put there is only so much you can say about such data.

Perhaps you can find a subset of genes that are know to be unaffected by the treatment then use those to build the empirical distribution of the expected variations. Then use that distribution to estimate to likelihood of observing a certain difference of expression in the remainder of the genes. This is still weak since it will be strong affected by errors.

ADD COMMENT • link 12.2 years ago by Istvan Albert 101k

1

Entering edit mode

I agree with Istavan's suggestion. You can pick few of the house keeping genes. You can use a list of genes that appear as house keeping in different studies and use them to calculate the expected variations.

ADD REPLY • link 12.2 years ago by Ashutosh Pandey 12k

0

Entering edit mode

Good idea. I'm interested in the technical details of such a procedure. Do you have any reference in hand?

ADD REPLY • link 12.2 years ago by Woa ★ 2.9k

2

Entering edit mode

you will probably be hard pressed to find a reference on how to do statistical analysis with no replicates.

what is described is just basic statistics, create a histogram, fit it with a normal, or just look at the percent of values within a certain distance, that will give you the probability of observing that difference, then look at how many you actually observe relative to the how many you would expect by your empirical distribution

ADD REPLY • link updated 12.2 years ago by Istvan Albert 101k • written 12.2 years ago by Biostar User ★ 1.0k

2

Entering edit mode

Sorry I dont have a reference. But there are some normalization methods that involve using house keeping genes:

http://wiki.c2b2.columbia.edu/workbench/index.php/Normalization#Housekeeping_Genes_Normalizer

OR you can just use housekeeping genes expression to get an idea about how much variation is observed between the two datasets for them and come up with a threshold for the fold change between genes to be differentially expressed but I dont think at any point you can calculate p-values.

ADD REPLY • link updated 12.2 years ago by Istvan Albert 101k • written 12.2 years ago by Ashutosh Pandey 12k

score 3 · Answer 2 · 2012-09-11

Here are a few possibilities that you might consider:

Some kind of outlier analysis on the difference (or fold-change) between treatment and reference. You might hope that important biological differences would stand out compared to differences arising from just sample-to-sample variation.
You could pretend that these are RNA-seq samples and concert to count-based data. People do statistics (e.g., Fisher Exact) on a single sample vs single sample in the RNAseq field all the time.
You could calculate the change gene rank in treatment vs reference. Then you could set up a random permutation test where you randomly assign ranks to genes by drawing from the reference and treatment and see if there were any unusually "lucky" jumps in rank in the actual data compared to random simulations.

Just to be clear. These are all really bad options. Trying to come up with statistics with no replicates will likely just get you smacked down by a reviewer if you ever attempt to publish. Your original idea to just use fold-changes is probably best. Although I would also consider the relative expression of genes. You might put more weight on a gene with FC=2 if it went from 10,000 to 20,000 than if it went from 1 to 2. Any candidates you identify will have to be validated before they are worth anything at all. The suggestion #1 above could also help you identify fold-change values that really stand out. Good luck!

score 2 · Answer 3 · 2012-09-10

In your case, I fear you can assess your data by fold change only (there are also solutions like PUMA), but the whole lack of replicates is not necessarily a blocking wall. Simple fold changes can be complemented with functional analysis of the most-changed genes, comparison with known contrasts (e.g. FARO server), mapping of the changed genes over known pathways (KEGG, MapMan).

Generally, if the biological response is strong enough, you will see it. Regardless of replicates, regardless of p-values. It will make sense just by putting enough thoughts and time on it.

Good luck!

score 1 · Answer 4 · 2012-09-10

You're going to be hard-pressed to squeeze any results out of such a small dataset. Can you supplement your data with publicly available datasets from repositories like GEO or ArrayExpress?

If not then you're limited to fold-change. You can rank your results by fold-change but you'll get a lot of false positives.

score 0 · Answer 5 · 2012-09-29

0

Entering edit mode

12.2 years ago

elb ▴ 260

Hi guys, Thank you very much for your precious suggestions. What we done is to use some genes as "housekeeping" genes, even if we were not totally sure that genes were unaffected by the treatment. Anyway this is the best we done in that conditions. Thank you again!

ADD COMMENT • link 12.2 years ago by elb ▴ 260