Question

gene filtering for agilent microarray using Limma

5

Entering edit mode

9.7 years ago

rohit ▴ 70

Hello everyone,

I'm analysing agilent microarray data using limma package. I want to do gene filter to remove the probes that do not give any expression values. Is there any workflow to filter genes in limma package?

Thank you in advance.

agilent microarray limma gene filtering R • 7.3k views

ADD COMMENT • link updated 2.1 years ago by Di Wu • 0 • written 9.7 years ago by rohit ▴ 70

0

Entering edit mode

Hi Rohit,

Please, how you did do to filter, I ask the same question?

Thanks for help,

ADD REPLY • link 8.5 years ago by OmBibo • 0

0

Entering edit mode

Hi Rohit, Please, how you did do to filter, I ask the same question? Thanks for help,

ADD REPLY • link 8.5 years ago by OmBibo • 0

0

Entering edit mode

sad, nothing improvement?

ADD REPLY • link 6.7 years ago by BioLite ▴ 20

zx8754 · Answer 1 · 2018-09-05

4

Entering edit mode

6.7 years ago

Kevin Blighe 89k

In microarrays, expression is measured as fluorescence intensities from probes that hybridise to their target cDNA sequences. Virtually all probes will fluoresce and return values.

Genes that are not expressed may fall into the background noise (or be slightly above it), the level for which is then corrected during RMA normalisation, the typical normalisation strategy. Background noise can be inferred from, for example, the expression gained from negative control probes. In RNA-seq, conversely, genes that are not expressed will have 0 count values.

You should ideally not do any filtering prior to running limma. Only probes that have been shown to repeatedly fail should be removed. Probes covering genes that you definitively know to not be expressed in your tissue of study could also be removed. It can be inferred that, after normalisation, those genes with the lowest values are reflective of genes that have virtually nil expression.

Kevin

ADD COMMENT • link 6.7 years ago by Kevin Blighe 89k

0

Entering edit mode

OH! So happy! Kevin answered this question! I am a newbie and have been concerned about Agilent single channel microarray analysis question for a long time. But I still bothered by some questions. All remarks on Biostar show Kevin you are always kindful and helpful, may I borrow some your precious time for informing me something important but I ignored? Thanks from my heart! I am trying to replicate the Agilent part of this article:https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1002794. Followed by its referred package Limma and Agi4x44PreProcess, I couldn't reach the article's results. I get a fault probe filter number and also get an error mention in the after analysis, my code as following:

library(limma)
#loading data
targets<-read.delim("E-MTAB-812.sdrf.txt")
dd<-targets[,c("Array.Data.File","Factor.Value.DISEASE_STATE.","Characteristics.RNA.Integrity.Number.")]
raw_data<-read.maimages(targets[,"Array.Data.File"],source = "agilent",green.only = TRUE,other.columns =c("gIsWellAboveBG","gIsSaturated","gIsFeatPopnOL","gIsFeatNonUnifOL") )

#Subtract the background
bg_data<-backgroundCorrect(raw_data,method = "normexp")
#Normalize between the arrays
nm_data<-normalizeBetweenArrays(bg_data,method = "quantile")
nm_data_avg<-avereps(nm_data,ID=nm_data$genes$ProbeName)
dim(nm_data)

#QC report
eSet<-new("ExpressionSet", exprs = nm_data$E,annotation=nm_data$genes$GeneName)
QC_report_nor<-arrayQualityMetrics(eSet)

#probe filter
Control<-nm_data$genes$ControlType==1
IsExpr<- rowSums(nm_data$other$gIsWellAboveBG>0) <=40
fil_data1<-nm_data[!Control&!IsExpr,]
fil_data2<-avereps(fil_data1,ID=fil_data1$genes$ProbeName)
dim(fil_data2)

#gene annotation
library(clusterProfiler)
library(hgug4112a.db)
tr_gene<-fil_data2$genes$ProbeName
Symbol<-bitr(tr_gene,fromType = "PROBEID",toType = "SYMBOL",OrgDb = hgug4112a.db,drop = FALSE)
EntrezID<-bitr(tr_gene,fromType = "PROBEID",toType = "ENTREZID",OrgDb = hgug4112a.db,drop = FALSE)
fil_data2$genes<-cbind(fil_data2$genes,Symbol,EntrezID)
fil_data2$genes<-fil_data2$genes[,c("PROBEID","SYMBOL","ENTREZID")]

##remove NA
no_symbol<-is.na(fil_data2$genes$SYMBOL)
no_entr<-is.na(fil_data2$genes$ENTREZID)
fil_data_rm<-fil_data2[!no_symbol&!no_entr,]

#DEG filter
rownames(fil_data_rm$E)<-fil_data_rm$genes$SYMBOL
fil_data_sy<-fil_data_rm$E
treatment<-targets[,"Factor.Value.DISEASE_STATE."]
TR_levels<-c("neurologically healthy control","Parkinson disease")
Group<-factor(treatment,levels = TR_levels)
design2<- model.matrix(~Group+0)
colnames(design2) <- c("Control","PD")

fita<- lmFit(fil_data_rm$E, design2)
cont.matrix <- makeContrasts(Diff=PD-Control, levels=design2)##!!!colnames could not too long -R can't identify
fitb<- contrasts.fit(fita, cont.matrix)
fitc<- eBayes(fitb)

Result_A<-topTable(fitc,adjust="BH",coef = "Diff",genelist =fil_data_rm$genes$SYMBOL ,number = Inf,p.value = 0.05)
summary(decideTests(fitc),lfc = 1.5)
de_gen<-decideTests(fitc,lfc =1.5)

Thanks for your precious time again!

ADD REPLY • link updated 6.7 years ago by zx8754 12k • written 6.7 years ago by BioLite ▴ 20

0

Entering edit mode

Hey, thank you for the comments. I try my best - that's all.

The lack of reproducibility of experiments is a very common problem in research. The study that you are using was published 6 years ago, so, new versions of limma and other packages have been released since then. This means that, starting from the raw data, you are highly unlikely to obtain the exact same values / results. However, generally, the interpretation of the results should be the same.

Some things for you:

did you try to start from the Processed data that the authors used? THis will contain the normalised data originally produced by the authors.
did you closely follow the methods in the manuscript?

In relation to the methods, I see key phrases like:

"Nine outlier samples were detected based on the arrayQualityMetrics default criteria and were dropped from further analyses"
"Microarray probes were removed if they had expression values outside the detectable spike-in range in more than 50% of the control arrays and more than 50% of the PD arrays, or if they had any of the Agilent flags IsWellAboveBG = 0, gIsSaturated = 1, gIsFeatPopnOL = 1, gIsFeatNonUnifOL = 1 in more than 75% of the arrays."
"The median expression value was used for replicated probes that passed the above filtering criteria. A total of 39,122 probes out of the total 45,015 probes present on the microarray chips were analyzed in the expression and eSNP studies. The expression data for the retained probes of the 53 arrays (E-MTAB-812 ArrayExpress dataset) were quantile normalized, and the obtained values were base 2 logarithm transformed."

Looking at your code, I am not sure that you did pre-filtering? They appear to have removed a lot of probes that were assumed to have failed, which is a practice performed in microarray analyses (as I mention above).

ADD REPLY • link 6.7 years ago by Kevin Blighe 89k

0

Entering edit mode

Dear Prof. Kevin Blighe, may I ask a question? As you said we should ideally not do any filtering prior to running limma, is there any possbile for us do filtering genes before the DEGs analyse? Such as following code, I only choose interested genes (a gene list named 'wt') and do the DEGs analyse, should it be more powerful? Thanks

fit <- lmscFit(MA.reomoveCTprobe[wt,], design,correlation=0.8)
cont.matrix <- makeContrasts(severe-Control,mild-Control,severe-mild,levels=design)
fit2 <- contrasts.fit(fit, cont.matrix)
fit2 <- eBayes(fit2)

ADD REPLY • link 2.1 years ago by Di Wu • 0

0

Entering edit mode

Dear Professor Di Wu, this code looks okay to me. The correlation value just relates to intra-spot (probe) correlation.

ADD REPLY • link 2.1 years ago by Kevin Blighe 89k

0

Entering edit mode

Dear Prof. Kevin Blighe, many thanks for your quick reply. About the correlation value, I used the following code for getting the value.

corfit <- intraspotCorrelation(MA.reomoveCTprobe, design)
  corfit$consensus
    [1] 0.1659504

But it seems that it is lower than normal. So I just use the 0.8 for the furtheranalyse, is it incorrect? Do you think that filter to less than 100 genes may get more solid result from the whole array? Thanks again. Best wishes, Di

ADD REPLY • link 2.1 years ago by Di Wu • 0

0

Entering edit mode

Hi, does this mean that when we use RMA method for processing, we do not have to filter for lowly-expressed probes across arrays, because it will be done during RMA?

ADD REPLY • link 2.7 years ago by Fate • 0