Multiple transcripts for same gene in array gene expression profile?
1
0
Entering edit mode
6.3 years ago

I am analysing microarray data from GEO for the first time. The data has already been processed and normalized, but the illumina beads have caught multiple transcripts for the same gene (I assume), meaning that some of my genes appear twice (e.g. MCL1 is a gene I am looking at and has two different Illumina ID's associated with it). I am trying to look at total gene expression and gene expression differences between diseased states, so I need to aggregate the data. How can I do this when some genes appear multiple times?

Expression profiling by array RNA-Seq illumina R • 2.1k views
ADD COMMENT
0
Entering edit mode

May be you can take average of expression of all the transcript as gene expression.

ADD REPLY
4
Entering edit mode
6.3 years ago

The ideal situation would be to obtain the raw data CEL or TXT files (edit: or whatever BEAD arrays use) and then summarise expression over genes using RMA and the median polish.

Given the data that you've got, you can just summarise it yourself as follows:

df <- data.frame(
  c("gene1","gene1","gene2","gene2","gene2","gene3"),
  c(1,2,3,10,20,30),
  c(60,50,40,3,2,1),
  c(7,8,9,100,110,120),
  c(12,11,10,9,8,7)
)

colnames(df) <- c("gene","sample1","sample2","sample3","sample4")

df
   gene sample1 sample2 sample3 sample4
1 gene1       1      60       7      12
2 gene1       2      50       8      11
3 gene2       3      40       9      10
4 gene2      10       3     100       9
5 gene2      20       2     110       8
6 gene3      30       1     120       7

summarise by mean

aggregate(df[,2:ncol(df)], by=df[1], mean)
   gene sample1 sample2 sample3 sample4
1 gene1     1.5      55     7.5    11.5
2 gene2    11.0      15    73.0     9.0
3 gene3    30.0       1   120.0     7.0

summarise by median

    aggregate(df[,2:ncol(df)], by=df[1], median)
   gene sample1 sample2 sample3 sample4
1 gene1     1.5      55     7.5    11.5
2 gene2    10.0       3   100.0     9.0
3 gene3    30.0       1   120.0     7.0

summarise by sum

     aggregate(df[,2:ncol(df)], by=df[1], sum)
   gene sample1 sample2 sample3 sample4
1 gene1       3     110      15      23
2 gene2      33      45     219      27
3 gene3      30       1     120       7

Kevin

ADD COMMENT
0
Entering edit mode

Dear @Kevin Blighe,

How to proceed with the FDR of each gene? Example: if I do a mean of expression between the probes of the same gene, should I also do a mean of FDR value? Would you have any reference to indicate me?

Best, Leite

ADD REPLY
0
Entering edit mode

You could just fit your own linear or logistic model to the data with lm() or glm(), and the adjust the p-values with p.adjust(), or use my package: https://bioconductor.org/packages/release/data/experiment/html/RegParallel.html

Better to have the raw data CEL, TXT, GAL, or DAT files, though.

ADD REPLY

Login before adding your answer.

Traffic: 1816 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6