Question

How to find biological meaning of my results at probe level ?

0

Entering edit mode

9.2 years ago

XBria ▴ 90

Hi everyone !

I am workin on Affymetrix HGU 133 Plus 2 Array:

http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE15824

I need to know the biological meaning of the normalized results (only one specific probe set of 12 tumor and 2 normal samples).

for example, 4 out of 11 probes are similar in all tumor and normal samples,however the rest are not.

I categorize them in 4 different groups based on their similarities.

how to confirm if they are similar to the truth ? and what are their meaning in biology ?

Thanks in advance.

affymetrix probes • 1.6k views

ADD COMMENT • link updated 9.2 years ago by cpad0112 21k • written 9.2 years ago by XBria ▴ 90

score 1 · Answer 1 · 2015-10-09

1

Entering edit mode

9.2 years ago

cpad0112 21k

Pooling probe expressions for probeset level analysis may not be a good idea, as per my understanding. However, I am not sure about how other scientists handle it.

ADD COMMENT • link 9.2 years ago by cpad0112 21k

Ram · Answer 2 · 2015-10-09

Dear Maria,

what do you mean by the "biological meaning of the normalized results"?? I believe that your question is a bit irrelevant to the true question that you would like to infer after normalization. In other words, there is no rationale in categorize your probesets in any groups except the ones that they are characterized based on their phenotype info: that is tumor and normal samples . Moreover, what is your specific experimental design ?? I would personally suggest, that you should be searching for probesets where you have "very little" or no evidence on expression across the majority of all samples, and exclude them from further consideration, as these would be "biologically uninformative" for your downstream analysis.

For istance, as you mentioned above you use the hgu133plus2 array. Thus, an excellent package you could use is the PANP R package

(http://www.bioconductor.org/packages/release/bioc/html/panp.html) to create absent/present calls for each one of your probesets (you can check more details about the methodology in the paper and vignette). Then, I would further suggest to use something like the following:

Assuming you have for instance in total 10 samples, 5 cancer and 5 control samples(you could provide more info here or adjust the following code according to your categories):

then first you could type:

present <- rowSums(call.matrix=="P") > 5

where call.matrix is the output of the PANP function above, which return a matrix of probesets in the rows and the samples in the columns-with values A/P/M according to if a probeset is characterized as absent or present-.

and then:

Eset <- Eset[present,]

where Eset is your normalized expression set, and what you finally keep is probesets which are characterized as "present" in at least one of your conditions(>5).

Hope this will help,
Efstathios