Greetings
I am doing Bioinformatics project based on microarray gene expression data and there are some basic issues I don't fully understand. I was hoping members of this forum may be able to help me. Please could you address the following points in turn
Is there a naming convention for Affy probe sets? This is an example page from GEO and it seems as though the probe sets have a naming convention but I cannot figure it out. Some names end in 'at' and others end in 'st.' Many names have '-5' or '-3' or 'M' in them too.
How can probes distinguish between mRNA that has and has not been processed (e.g. intron splicing). Is this possible? I expect most researchers want to know the processed mRNA (see next point)
How do probes in general account for the fact that genes can have specific transcript variants? Does the probe target a common sequence in all isoforms or do you get different probes for the different transcripts? Examples would be helpful. I presume most researchers want to know which specific transcript variants are present in a cell
How do probes account for sequence variations such as SNPs? A variation within a gene shouldnt affect the level of transcription of a gene (or should it?) but it could affect the binding of a transcript to a target probe. Are probe sequences designed such that they exclude known SNPs
probe sets contain a set of overlapping probes for a target sequence. Do you expect a target mRNA sequence to bind equally to each of these probes? Do the statistical analysis take into account the 'average' binding of an mRNA to all of the probe in a probe set to give a picture of the expression level of an mRNA?
Thank you for your time
It actually has nothing to do with PCR amplification, because PCR amplification is not used. Rather, the 3' bias comes from the step in which mRNA is converted into cDNA by reverse transcriptase (RT), using oligo-dT to compliment the poly-A tail of the mRNA. RT is not very processive, and signal decays the further one gets form the poly-A tail. The amplification step comes from using RNA polymerase T7 which linear amplifies the cDNA from a T7 promoter that was part of the original oligo-dT cDNA primer. T7 is a more processive enzyme that RT, so the bias is mainly introduced in the RT step.
Thank-you kindly for your response. 1) The probeset naming link does not mention the significance of the '3' or '5' or 'M' 2) and 3) Thank-you for pointing out full exon arrays. I was not aware of these. 4) I am not interesting in detecting variants; I was interested in how the chips 'work around' variants
5) I was not aware of the amplification issue and I don't think I fully appreciate it. Doesn't PCR amplify the whole isolated mRNA sequence? Are you saying that PCR gives amplified fragments of different lengths with more shorter fragments? I only have text-book knowledge of PCR
No it's not really a PCR problem perse. It is more degradation of the mRNA sample that starts on one end that causes the effect. We have a graph and explanation at www.arrayanalysis.org. Clisk "sample prep controls" and then look for "Overall RNA quality control: RNA degradation plot" (and please be aware that this is just our implementation of an existing Bioconductor module)
The '3', '5' and M probesets are control sets for these specific regions. I have added a link in the answer to a text describing how you can use those.
Absolutely true @Seidel in fact both the amplification and used in our paper and the normal labelling procedures are T7 based: RNA amplification
"Total RNA isolated from the LV biopsies was amplified for a second round using a protocol largely based on the linear T7-based procedure described by Baugh [16], with some minor modifications, and thereby resembles the current Affymetrix protocol for first round RNA amplification (GeneChip® Two-Cycle cDNA Synthesis, round 1)."
Absolutely true @Seidel in fact both the amplification as used in our paper and the normal labelling procedures are T7 based: RNA amplification "Total RNA isolated from the LV biopsies was amplified for a second round using a protocol largely based on the linear T7-based procedure described by Baugh [16], with some minor modifications, and thereby resembles the current Affymetrix protocol for first round RNA amplification (GeneChip® Two-Cycle cDNA Synthesis, round 1)."