Question

How to calculate differential gene expression from single sample on Affy GeneChip?

0

Entering edit mode

9.4 years ago

JacobS ▴ 990

I have a .CEL file from a genechip human genome u133 plus 2.0 array. I would like to calculate expression fold change for all of the genes represented therein. Typically I work with RNA-Seq, and for an experiment like this I would have 2 samples, A and B, and at least 3 replicates of each. I would then build a count matrix, use the replicates to estimate dispersion, and then generate fold changes between the samples.

However, in this case I only have a single sample and want to determine the relative expression of each gene. Since I need some benchmark to which I can calculate fold changes for the genes (something to be fold change = 1), I am assuming I should use some subset of known housekeeping genes. However, I've never done this before and would like some advice from those more experienced.

Can someone please explain to me what needs to be done to go from a .CEL file to a list of fold change values for all genes in my sample, as determined by comparison to a housekeeping-determined baseline? Ideally, I would like to replicate this on samples in the future, and since their housekeeping baseline should be comparable with this first sample, hopefully I could make inferences on the DGE between the samples.

Thanks for any suggestions!

microarray genechip affy • 3.9k views

ADD COMMENT • link updated 2.0 years ago by Ram 44k • written 9.4 years ago by JacobS ▴ 990

0

Entering edit mode

You can not compare expression levels between genes in a single array. There is a lot of variance between probes because of sequence content, fragment length size, the position of the probe on the mRNA (distance to 3'-end) so forget about it. There must be better was to reacties your goals

ADD REPLY • link 9.4 years ago by Irsan ★ 7.8k

0

Entering edit mode

Is there perhaps some reference for the genechip human genome u133 plus 2.0 array? For example, if a very vanilla prep of human cells were run on this chip and I can access that data, I'm thinking I could then compare my sample to that vanilla reference and determine which genes in my sample have fold changes relative to a normal human sample

ADD REPLY • link updated 5.1 years ago by Ram 44k • written 9.4 years ago by JacobS ▴ 990

0

Entering edit mode

So I'm guess I'm asking if a very good human control is available for this chip, so that I can use my sample as the experimental condition and calculate DGE from that. Specifically, I'm looking for a normal gastric mucosa tissue reference.

ADD REPLY • link 9.4 years ago by JacobS ▴ 990

1

Entering edit mode

9.4 years ago

matt.newman ▴ 170

It's not ideal, but if you need to do this, I'd use GAPDH or B-Actin. The only concern is that these genes do change in some samples. But this would give you the level of expression for all genes, in relation to these housekeepers.

Long-term though, you really want to have more than one replicate.

ADD COMMENT • link updated 2.0 years ago by Ram 44k • written 9.4 years ago by matt.newman ▴ 170

0

Entering edit mode

I was more expecting to aggregate 50 - 100 housekeeping genes and use them to generate a more robust baseline. Any experience in doing that?

ADD REPLY • link 9.4 years ago by JacobS ▴ 990

0

Entering edit mode

I'm looking for specific list of housekeeping genes to consider, as well as software packages that can takes such a list and generate DGE

ADD REPLY • link 9.4 years ago by JacobS ▴ 990

0

Entering edit mode

Here's a few lists of housekeeping genes that you might try: http://www.tau.ac.il/~elieis/HKG/, http://www.tau.ac.il/~elieis/Housekeeping_genes.html, http://www.sabiosciences.com/rt_pcr_product/HTML/PAHS-000A.html

We offer software to do it: http://www.omicsoft.com/array-studio. We have a full platform for NGS analytics, standard gene expression, and more.

ADD REPLY • link updated 2.0 years ago by Ram 44k • written 9.4 years ago by matt.newman ▴ 170

0

Entering edit mode

9.4 years ago

h.mon 35k

It is not clear to me, do you have 1) arrays comparing different groups (control vs experimental), but no replicates; or 2) arrays for just one group, with replicates; or 3) just a single array?

I may be misunderstanding, but it seems to me you have one single array (no groups, no replicates). If this is the case, you cannot estimate fold-change nor dispersion. A typical array workflow is pretty much the same as you described for RNAseq.

edit: I agree with seidel answer, it doesn't make sense to talk about fold-change if you have just one group. As you have replicates I think you can, though, estimate the dispersion and see which genes are more and which are less variable than the average.

ADD COMMENT • link updated 2.0 years ago by Ram 44k • written 9.4 years ago by h.mon 35k

0

Entering edit mode

Thanks for your response. I have a arrays for just one group, with replicates. For RNA-Seq, I always had an experimental and control, and would calculate FC between them. In this case, I have just a single sample, and I want to calculate fold change with respect to housekeeping genes.

ADD REPLY • link updated 2.0 years ago by Ram 44k • written 9.4 years ago by JacobS ▴ 990

Ram · Accepted Answer · 2015-07-15

by comparison to a housekeeping-determined baseline?

I was more expecting to aggregate 50 - 100 housekeeping genes...

For a single set of gene measurements (i.e. the results of a single affy chip) there is no such thing. The light signals detected on the affy chip reflect the expression level of each gene in the sample (the concentration of a given transcript). Thus you could rank all the genes by signal, but "housekeeping" genes are present at all signal levels - thus in a single data set there is no general housekeeping baseline that could consist of more than one gene. Your options could be to choose a single gene as a reference point - but it's been pointed out that any single gene will change it's rank (and it's concentration) in a sample from experiment to experiment. Or you could make up some kind of ad hoc score based on a property of the data set, i.e. an average or median value, etc., and calculate fold change against that. This is usually a property defined by the entire data set, but you could limit it to some class of genes - but you'd be inventing a novel method and there would be many caveats. There are also some controls spiked into affy samples that might be of use. But it sounds like you have to think more carefully about what it is you're actually trying to achieve, as fold change based on a single sample doesn't make any sense.