Question

Repeated ensembl IDs in microarray DEG analysis

1

Entering edit mode

10 weeks ago

Pereira G ▴ 10

Hello,

I'm working on a mice microarray dataset (GPL8321). I've annotated the dataset using the affytools annotateEset function and proceed with the limma pipeline for differential expression. However, looking at the genes names of the DEGs, I noticed that some genes were duplicated, with different expression values obtained. Looking further, I also noticed that this GPL have a great number of probe ids that map to the same ensembl id multiple times.

eset <- rma(celdata)
eset <- annotateEset(eset, mouse430a2.db, columns = c("PROBEID", "ENTREZID", "SYMBOL", "GENENAME", "ENSEMBL"))

table(duplicated(fData(eset)$ENSEMBL))
FALSE  TRUE 
13113  9577

My question is, the best practice should be to remove the duplicated ensembl IDs before the differential expression anaylsis? This high number of duplicates wouldnt interfere with the statistical analysis and p-value computation?

Should this be handled by computing the mean value of the probes that map to the same ensembl? And how can I achieve it on a Large ExpressionSet object (eset)?

R annotation DEGs microarray limma • 197 views

ADD COMMENT • link 10 weeks ago by Pereira G ▴ 10