summarise by mean

Question

Multiple transcripts for same gene in array gene expression profile?

0

Entering edit mode

7.1 years ago

ravelarvargas • 0

I am analysing microarray data from GEO for the first time. The data has already been processed and normalized, but the illumina beads have caught multiple transcripts for the same gene (I assume), meaning that some of my genes appear twice (e.g. MCL1 is a gene I am looking at and has two different Illumina ID's associated with it). I am trying to look at total gene expression and gene expression differences between diseased states, so I need to aggregate the data. How can I do this when some genes appear multiple times?

Expression profiling by array RNA-Seq illumina R • 2.5k views

ADD COMMENT • link updated 7.1 years ago by Kevin Blighe 89k • written 7.1 years ago by ravelarvargas • 0

0

Entering edit mode

May be you can take average of expression of all the transcript as gene expression.

ADD REPLY • link 7.1 years ago by Prakash ★ 2.2k

score 4 · Answer 1 · 2018-08-17

The ideal situation would be to obtain the raw data CEL or TXT files (edit: or whatever BEAD arrays use) and then summarise expression over genes using RMA and the median polish.

Given the data that you've got, you can just summarise it yourself as follows:

df <- data.frame(
  c("gene1","gene1","gene2","gene2","gene2","gene3"),
  c(1,2,3,10,20,30),
  c(60,50,40,3,2,1),
  c(7,8,9,100,110,120),
  c(12,11,10,9,8,7)
)

colnames(df) <- c("gene","sample1","sample2","sample3","sample4")

df
   gene sample1 sample2 sample3 sample4
1 gene1       1      60       7      12
2 gene1       2      50       8      11
3 gene2       3      40       9      10
4 gene2      10       3     100       9
5 gene2      20       2     110       8
6 gene3      30       1     120       7

summarise by mean

aggregate(df[,2:ncol(df)], by=df[1], mean)
   gene sample1 sample2 sample3 sample4
1 gene1     1.5      55     7.5    11.5
2 gene2    11.0      15    73.0     9.0
3 gene3    30.0       1   120.0     7.0

summarise by median

    aggregate(df[,2:ncol(df)], by=df[1], median)
   gene sample1 sample2 sample3 sample4
1 gene1     1.5      55     7.5    11.5
2 gene2    10.0       3   100.0     9.0
3 gene3    30.0       1   120.0     7.0

summarise by sum

     aggregate(df[,2:ncol(df)], by=df[1], sum)
   gene sample1 sample2 sample3 sample4
1 gene1       3     110      15      23
2 gene2      33      45     219      27
3 gene3      30       1     120       7

Kevin