normalised data file extract values to do calculation
1
0
Entering edit mode
10.5 years ago
cara78 ▴ 10

Hello,

I have a normalized file referred to in my programme as "geneSummaries". For each gene across the samples I want to get the median. I am using a For loop to do this.

Normalised the .CEL files which point to geneSummaries

exprs(geneSummaries)-> num
for(i in num){
    geneSummaries[-1, ]
    geneSummaries[ ,i]
    Will be doing the calculation here....
}

So my Question is to do with pulling the data out of the file to do the calculation. I want it drop the 1st column as this is just the name of the sample. See below a sample example of it, when I opened it.

7945460  7.471390  7.256158  7.287770  7.545794 
7945462  7.609366  7.528324  7.324294  7.310791
7945475  5.375443  5.749566  5.519073  5.806861

So to do that I used geneSummaries[-1, ] but it doesn't work

And to pull information out of the ith column I used geneSummaries[ ,i] so can later do calculation, which also doesn't work.

Could someone suggest a idea on how to do this, please.

normalized extract matrix R • 2.4k views
ADD COMMENT
0
Entering edit mode

Thanks so much .........

ADD REPLY
2
Entering edit mode
10.5 years ago

I'm assuming that geneSummaries is an eSet. If you just want the median, then it'd be a lot simpler to just:

medians <- apply(exprs(geneSummaries), 1, median)

I'm guessing that the rows are actually then probes/genes rather than the columns. The exprs() accessor doesn't return the row or column names, those are simply displayed when a matrix is shown on screen.

ADD COMMENT
0
Entering edit mode

Hello,

I have a another question please. I am doing this survival analysis and the plan is :

Normalised data

Determine number genes

For every ith gene on the array

Find median of ith gene across samples

Make gene expression 1 or 0 depending on whether or not raw expression value is above or below median (diGene)

gene_survival <- coxph(Surv(survival time, status)~diGene)

My code I have so far :

cel_files <- dir(data_directory, full.names = T, pattern = ".CEL")
print(cel_files)

norm_data <- just.gcrma(cel_files)

exprs(norm_data)-> num

# for loop
for(i in num){                       
medians <- apply(exprs(norm_data), 1, median)

    if(medians > norm_data){
       diGene <- 1;
    } else {
       diGene <- 0;
    }
print(diGene);
}

My question is to do with the comparison if(medians > norm_data) I need to compare to the raw expression but there is an error

Error in medians > norm_data :
  comparison (6) is possible only for atomic and list types"

Am I wrong in thinking that norm_data is the raw data (UN-normalised) or those raw expression mean before Quality control and probeset filtration done?

Thanks

ADD REPLY
0
Entering edit mode

Firstly, remove the for(i in num) loop. That'll iterate over every value in the matrix (not to mention that medians <- apply(exprs(geneSummaries), 1, median) will produce the matrix of row medians in one line), which isn't what you want (and anyway, it isn't used).

Secondly, medians > geneSummaries doesn't make sense since medians is a vector of values and geneSummaries is an eSet object (thus leading to that error).

Finally, regressing on whether a genes expression in a subject is above/below the mean is a bad idea. Firstly, you'll be performing enough regressions that your power will be atrocious. Secondly, even if a gene happened to correlate itself, this isn't exactly meaningful/useful if the variance of the gene is itself really small.

I would recommend reassessing your approach before spending more time on it.

ADD REPLY
0
Entering edit mode

Ok thanks will have another look at it.

ADD REPLY

Login before adding your answer.

Traffic: 2540 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6