Question

I have two questions of spike-in data sets for microarray experiments

0

Entering edit mode

9.7 years ago

Avro ▴ 160

Hi everyone,

I am reading a book called "Statistics and Data Analysis for Microarrays Using R and Bioconductor". More specifically, I am looking at the limitations of microarrays, and I don't understand this sentence:

"The variance of average chip intensity among spike-in data sets is much lower than those measured in most real-life data sets, casting doubts on the general applicability of these data for developing analytical tools for highly diverse clinical expression profiles."

I have two questions:

I understand that spike-in data sets are control that you include in your sample preparation, but how do they work when you analyze/transform your data?
What does the author mean by "the variance of average chip intensity among the spike-in data sets"? I know what variance is. For example, if I have 42 control genes, do I compute the average intensity for all of them for each array and then compute the variance?

Thank you!

microarray • 2.7k views

ADD COMMENT • link updated 2.6 years ago by Ram 45k • written 9.7 years ago by Avro ▴ 160

Ram · Accepted Answer · 2015-08-25

1

Entering edit mode

9.7 years ago

JC 13k

Spike-in sequences are used to scale properly the intensities among chips. Suppose you have 1 spike-in gene in 2 chips, if one chip have an expression level for this gene as 100 and the second chip as 200, you can scale all values in chip 1 doubling the value or in chip 2 by halves. Of course you have more than one sequence in different concentration, therefore you can adjust your intensity values distributions properly using more sophisticated methods.
Yes. But the point is that Spike-in sequences have lower variance than the real genes in your samples, so they are not useful.

ADD COMMENT • link updated 2.6 years ago by Ram 45k • written 9.7 years ago by JC 13k

0

Entering edit mode

Hi! Thank you for your answers! So, this is done so we can compare mRNA expression between different platforms/conditions. It's a all about normalization. I'm sorry, but I don't understand your last sentence about using different concentrations of the same sequence.

So, these spike-ins are only good for normalizing since they do not reflect the full spectrum of gene expression variability, right?

Thank you very much!

ADD REPLY • link updated 2.6 years ago by Ram 45k • written 9.7 years ago by Avro ▴ 160

0

Entering edit mode

No, you have several sequences (each one different) with several known concentrations as Spike-Ins.

And yes, they are good for normalization between samples, real sequences can be more variable.

ADD REPLY • link 9.7 years ago by JC 13k