Affymetrix Analysis At Probe Level
1
1
Entering edit mode
12.3 years ago

Hi,

how can I perform normalization and present/absent calls at the probe level of an Affymetrix experimental set?

I got data from an Affymetrix CGH chip (http://www.ebi.ac.uk/arrayexpress/arrays/A-AFFY-48). It was designed to test whole genome hybridization of several bacteria. It consists of 16 oligos for each predicted orf. Later on, expression experiments were performed with the same chip. Now I would like to know, if some of the probes are absent, i. e. in case of shorter transcripts.

With some Bioconductor packages, namely affy and makecdfenvs, I created a custom CDF package and was able to load the data. I could read the intensities of the probes of interest, but I'm not able to map them to the oligonucleotide sequences. I'm not sure, if the are in the same order as I expect it because IDs are only provided for the probe sets (or I could not find the probe IDs).

The question is now, how can I map the probe intensities to the oligo sequences (given with a probe set ID and X/Y coordinates i.e. Cbur00002_at:366:685). Okay, I got the answer myself. The MAGE-ML files are provided at EBI. I can use RMAGEML to import these files, at least theoretically. The RMAGEML seem to be outdated, at least the documentation.

Anyway, the normalization at the probe level and the mapping from RMAGEML/limma to the AffyBatch object is still unclear, as well as how to determine the presence/absence of parts of orfs.

Does anybody have some hints or a similar workflow?


Regards, Mathias

affymetrix r microarray expression • 3.1k views
ADD COMMENT
1
Entering edit mode
12.3 years ago

I am not sure about your question. Do you really mean individual probes? Since you also talk about custom cdf's that seems not to be the case. If what you want to do actually is traditional Affymetrix analysis you might want to have a look at our http://www.arrayanalysis.org.

ADD COMMENT
0
Entering edit mode

Creating the custom CDF was necessary just for some Bioconductor packages.

To be more specific: I'd like to get present/absent values at the probe level rather than for whole probe sets.

ADD REPLY
1
Entering edit mode

Aha... And how would you do that conceptually? Probesets are called absent when they are below a threshold and show too much variation. Probes do not show variation, so you would use threshold only? The danger of removing a few probes from a set in that way is that you artificially make the variation in the remaining set lower and the average value higher. The lower variation could actually lead to calling differences between arrays statistically significant while they are not. And on a single array a more variable set with the same average might actually show up higher after such a treatment. If that approach is what you intend I would think it is a dangerous way to polish data.

ADD REPLY
1
Entering edit mode

I performed nine experiments, four of them are just replicates. Hence I have the variation across the nine experiments. The problem is that I'd like to (re-)use the probe level information to determine of I got the correct start of a bacterial protein. If a protein have alternative possible starts (which is quite often the case in prokaryotes), I would expect that probes located upstream of correct start have a much lower intensity across all experiments. I know, there are better ways to determine the protein starts and sometimes longer proteins were translated instead of the shorter ones. But I don't have any other experimental data and therefore want to reuse the array data. Later on, I'll compare it with peptides from proteomic data to see if it works.

As you can see, I don't want to polish data. I would never do that because I don't like polishing. ;-)

ADD REPLY

Login before adding your answer.

Traffic: 2514 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6