R: Collapsed to unique genes (Affymetrix microarray)
1
1
Entering edit mode
5.0 years ago
Joe Kherery ▴ 140

I have a list of genes from an Affymetrix microarray analysis.

However I have duplicate genes, I wonder how do I collapsed to unique genes by calculating the mean expression of transcripts from the same gene locus?

Any information is valid.

Thank you!

r microarray Affymetrix • 3.4k views
ADD COMMENT
2
Entering edit mode
5.0 years ago

Hello Joe,

You can use aggregate() or limma's avereps() function.

Kind regards,

Kevin

ADD COMMENT
0
Entering edit mode

Hello kevin

I tried avereps() function, after compute statistics for the data, however I don't think avereps() did the media, it seems that he just removed the gene with the lowest value.

or should I do this before generating my list of DEGs?

ADD REPLY
1
Entering edit mode

Yes, this 'summarisation' step is typically done prior to differential expression analysis. I show here how avereps() calculates the mean: A: How to combine expression values of multiple probes for one gene?

ADD REPLY
1
Entering edit mode

Thank you so much Kevin!

ADD REPLY
0
Entering edit mode

Hello kevin

you know how to access avereps using the function eset <- readExpressionSet("data.txt", header=T)

 `eset_fil<-avereps(eset,ID=eset$geneID)`

I always get the error:

Error in avereps.default(eset, ID = eset$geneID) : No probe IDs

ADD REPLY
1
Entering edit mode

Ah, your input object is an ExpressionSet, so, that changes the behaviour of the function. The bahaviour of many functions varies depending on the class of the input object (SingleCellExperiment, ExpressionSet, EList, data matrix, data frame, etc).

What is stored in eset$geneID?

ADD REPLY
0
Entering edit mode

Actually I have a data.txt file which is normalized microarray data and I also have Annotation data. I can easily analyze normalized data.txt, but in the end when i merge with my Annotation list, I end up with duplicate genes.

So I decided to merge data.txt + annotations and use the avereps function to avoid duplicate genes.

But that's not doing well.

What is stored in eset$geneID?

the list of genes

ADD REPLY
0
Entering edit mode

Could I not do it that way?

1.Find my list of differentially expressed probes after merge with the annotation and then do the avereps? and remove the NA?

ADD REPLY
0
Entering edit mode

Oh, but you can just read data.txt into a standard data-frame or data-matrix. Do you need it to be an ExpressionSet? limma can work with any data-matrix / -frame

ADD REPLY
0
Entering edit mode

Here is a quick example:

a <- as.data.frame(matrix(rexp(200, rate=.1), ncol=20))

a <- data.frame(
  geneID =c(rep("a", 5), rep("g", 5)),
  a)

a
   geneID         V1         V2         V3        V4         V5          V6
1       a 10.0124668 11.9255510  4.0475745  1.358108  5.3276694 31.85456954
2       a  1.7395386 11.2418697 32.3563350 54.580172 12.2343322 14.74439485
3       a  3.2208449  4.4952939  1.8298942  1.452729  2.5981188 12.35995226
4       a  5.4150530 12.3290295  5.6130080 16.716527 14.3667175  8.91892914
5       a 20.0084010  1.3876296 10.0609643  3.777215  0.4839389 11.18110204
6       g  0.4891083  9.6643045  9.6691789 11.450733  8.6788831  1.98622159
7       g 10.3117987 12.0075213  0.1763281 16.698935 14.1390305 16.61070795
8       g 14.1528589  2.3008524  2.4303481 11.761187  2.6730166  1.27720752
9       g  1.2306315  0.9444662 14.0066054  6.947162  2.6716584  3.47272504
10      g  6.0259514 18.1136992  9.9320698 15.407635  0.3642130  0.02209801
          V7           V8         V9        V10        V11        V12
1  10.878040 1.305372e+01  0.9893343 22.7783039  8.8343238  6.4427322
2   7.674188 1.172054e+01  6.1814065  0.7211316  4.9604433  0.5250960
3   4.754873 5.618160e+00 17.1038089  6.2044634 27.5845328 41.8400992
4   3.416827 3.392045e+00  2.1885927 36.6936841  0.7678490  9.8909097
5  29.270410 4.456995e-04  2.7364219 17.5774500 21.6362236  6.9892572
6  21.913587 1.677730e+01  3.9166071  3.9899511  0.8409301  0.7111987
7  11.465969 1.267289e+00  5.6575621  1.7919602  8.8182862  6.5374935
8  13.170292 2.390979e+01 11.3208811  8.5220598  5.2944025 35.6490934
9   6.194696 5.090157e+00  9.3267260  6.7335488 25.7741919  2.3195227
10  2.466330 1.286938e+01  7.5946475 28.1245333 13.6518179  4.7680661
          V13        V14         V15       V16        V17       V18       V19
1   2.9727924  5.2258699 29.55052895 13.821802  0.8710937  2.362678 11.799666
2   9.1994559  0.1900076  1.72736752 24.377843  3.1250694  8.846956  8.866539
3  17.9306954  9.1904710  6.34737243  6.697301  1.3553958 30.822013  6.671391
4   2.7093821  2.1168722  0.01587546 22.641845 11.4590629  4.073248 21.109629
5   5.6086534  0.7472446  3.59445844  5.283256  0.8475345 13.292848  6.956337
6   3.6795144  9.1000291 17.90547453 16.219322 13.2109528  2.547036 45.453231
7   0.1632086  4.0360949  1.90470806 11.935634  6.2560181 27.077792 14.666115
8   3.4062190  9.9844650  2.83656092  6.193150  1.0027526 49.312002  8.554801
9   2.6365014 14.2642789  2.01253534 16.994186  9.3008740 16.310225  2.268120
10  1.1968069  9.5823697  5.07947367  6.222578  7.0315114  4.734242 23.603816
          V20
1  20.1157690
2  13.9122525
3  19.4145607
4   4.1486591
5  15.0403439
6  11.6190681
7   2.3373972
8   3.5504567
9   9.7002796
10  0.6964145

limma::avereps(a[,2:ncol(a)], a$geneID)
        V1       V2        V3       V4       V5        V6       V7        V8
a 8.079261 8.275875 10.781555 15.57695 7.002155 15.811790 11.19887  6.756982
g 6.442070 8.606169  7.242906 12.45313 5.705360  4.673792 11.04218 11.982784
        V9       V10      V11       V12      V13      V14      V15      V16
a 5.839913 16.795007 12.75667 13.137619 7.684196 3.494093 8.247121 14.56441
g 7.563285  9.832411 10.87593  9.997075 2.216450 9.393448 5.947751 11.51297
       V17      V18      V19       V20
a 3.531631 11.87955 11.08071 14.526317
g 7.360422 19.99626 18.90922  5.580723
ADD REPLY
0
Entering edit mode

Hello Kevin,

I try it, but dont work.

ADD REPLY
0
Entering edit mode

I don't think I need to read it as ExpressionSet, my data.txt contains probe ID names and samples ... but I don't think i can analyze it any other way ...

or is it simply read use eset_fil<-avereps(eset,ID=eset$geneID) and do as I usually do in limma?

  f <- factor(targets$Target, levels = unique(targets$Target))
 design <- model.matrix(~0 + f)
 colnames(design) <- levels(f)
 design <- model.matrix(~0 + f)
 colnames(design) <- levels(f)
 fit <- lmFit(eset, design)
write.table(fit, file="fit.txt", sep="\t", quote=FALSE)
contrast.matrix <- makeContrasts("drug-Control", levels=design)
fit2 <- contrasts.fit(fit, contrast.matrix)
fit2 <- eBayes(fit2)
output <- topTreat(fit2, coef=1, number=Inf, adjust.method="BH", lfc=2)
write.table(output, file="DEGS.txt", sep="\t", quote=FALSE)

or there is a point I have to make after I do the avereps? para usar o limma?

ADD REPLY
0
Entering edit mode

I will try as you exemplified. Thanks

ADD REPLY
0
Entering edit mode

Fala português? In the example above, you should have:

fit <- lmFit(eset_fil, design)

..where eset_fil is the output of avereps()

ADD REPLY
0
Entering edit mode

Sí, pero no mucho.

I did as you said, but I still get this error.

 Error in rowMeans(y$exprs, na.rm = TRUE) : 'x' must be numeric

I tried transform eset_fil into data frame but same error. I triend as.matrix.... not results

ADD REPLY
0
Entering edit mode

Can you show all of your steps from when you read in the file's contents? Also, paste here a few rows of your data after you have read it in. Remember that you do not have to use readExpressionSet() - just use read.csv() or read.table()

ADD REPLY

Login before adding your answer.

Traffic: 1927 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6