Hello everyone,
I'm analysing agilent microarray data using limma package. I want to do gene filter to remove the probes that do not give any expression values. Is there any workflow to filter genes in limma package?
Thank you in advance.
Hello everyone,
I'm analysing agilent microarray data using limma package. I want to do gene filter to remove the probes that do not give any expression values. Is there any workflow to filter genes in limma package?
Thank you in advance.
In microarrays, expression is measured as fluorescence intensities from probes that hybridise to their target cDNA sequences. Virtually all probes will fluoresce and return values.
Genes that are not expressed may fall into the background noise (or be slightly above it), the level for which is then corrected during RMA normalisation, the typical normalisation strategy. Background noise can be inferred from, for example, the expression gained from negative control probes. In RNA-seq, conversely, genes that are not expressed will have 0 count values.
You should ideally not do any filtering prior to running limma. Only probes that have been shown to repeatedly fail should be removed. Probes covering genes that you definitively know to not be expressed in your tissue of study could also be removed. It can be inferred that, after normalisation, those genes with the lowest values are reflective of genes that have virtually nil expression.
Kevin
OH! So happy! Kevin answered this question! I am a newbie and have been concerned about Agilent single channel microarray analysis question for a long time. But I still bothered by some questions. All remarks on Biostar show Kevin you are always kindful and helpful, may I borrow some your precious time for informing me something important but I ignored? Thanks from my heart! I am trying to replicate the Agilent part of this article:https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1002794. Followed by its referred package Limma and Agi4x44PreProcess, I couldn't reach the article's results. I get a fault probe filter number and also get an error mention in the after analysis, my code as following:
library(limma)
#loading data
targets<-read.delim("E-MTAB-812.sdrf.txt")
dd<-targets[,c("Array.Data.File","Factor.Value.DISEASE_STATE.","Characteristics.RNA.Integrity.Number.")]
raw_data<-read.maimages(targets[,"Array.Data.File"],source = "agilent",green.only = TRUE,other.columns =c("gIsWellAboveBG","gIsSaturated","gIsFeatPopnOL","gIsFeatNonUnifOL") )
#Subtract the background
bg_data<-backgroundCorrect(raw_data,method = "normexp")
#Normalize between the arrays
nm_data<-normalizeBetweenArrays(bg_data,method = "quantile")
nm_data_avg<-avereps(nm_data,ID=nm_data$genes$ProbeName)
dim(nm_data)
#QC report
eSet<-new("ExpressionSet", exprs = nm_data$E,annotation=nm_data$genes$GeneName)
QC_report_nor<-arrayQualityMetrics(eSet)
#probe filter
Control<-nm_data$genes$ControlType==1
IsExpr<- rowSums(nm_data$other$gIsWellAboveBG>0) <=40
fil_data1<-nm_data[!Control&!IsExpr,]
fil_data2<-avereps(fil_data1,ID=fil_data1$genes$ProbeName)
dim(fil_data2)
#gene annotation
library(clusterProfiler)
library(hgug4112a.db)
tr_gene<-fil_data2$genes$ProbeName
Symbol<-bitr(tr_gene,fromType = "PROBEID",toType = "SYMBOL",OrgDb = hgug4112a.db,drop = FALSE)
EntrezID<-bitr(tr_gene,fromType = "PROBEID",toType = "ENTREZID",OrgDb = hgug4112a.db,drop = FALSE)
fil_data2$genes<-cbind(fil_data2$genes,Symbol,EntrezID)
fil_data2$genes<-fil_data2$genes[,c("PROBEID","SYMBOL","ENTREZID")]
##remove NA
no_symbol<-is.na(fil_data2$genes$SYMBOL)
no_entr<-is.na(fil_data2$genes$ENTREZID)
fil_data_rm<-fil_data2[!no_symbol&!no_entr,]
#DEG filter
rownames(fil_data_rm$E)<-fil_data_rm$genes$SYMBOL
fil_data_sy<-fil_data_rm$E
treatment<-targets[,"Factor.Value.DISEASE_STATE."]
TR_levels<-c("neurologically healthy control","Parkinson disease")
Group<-factor(treatment,levels = TR_levels)
design2<- model.matrix(~Group+0)
colnames(design2) <- c("Control","PD")
fita<- lmFit(fil_data_rm$E, design2)
cont.matrix <- makeContrasts(Diff=PD-Control, levels=design2)##!!!colnames could not too long -R can't identify
fitb<- contrasts.fit(fita, cont.matrix)
fitc<- eBayes(fitb)
Result_A<-topTable(fitc,adjust="BH",coef = "Diff",genelist =fil_data_rm$genes$SYMBOL ,number = Inf,p.value = 0.05)
summary(decideTests(fitc),lfc = 1.5)
de_gen<-decideTests(fitc,lfc =1.5)
Thanks for your precious time again!
Hey, thank you for the comments. I try my best - that's all.
The lack of reproducibility of experiments is a very common problem in research. The study that you are using was published 6 years ago, so, new versions of limma and other packages have been released since then. This means that, starting from the raw data, you are highly unlikely to obtain the exact same values / results. However, generally, the interpretation of the results should be the same.
Some things for you:
In relation to the methods, I see key phrases like:
Looking at your code, I am not sure that you did pre-filtering? They appear to have removed a lot of probes that were assumed to have failed, which is a practice performed in microarray analyses (as I mention above).
Dear Prof. Kevin Blighe, may I ask a question? As you said we should ideally not do any filtering prior to running limma, is there any possbile for us do filtering genes before the DEGs analyse? Such as following code, I only choose interested genes (a gene list named 'wt') and do the DEGs analyse, should it be more powerful? Thanks
fit <- lmscFit(MA.reomoveCTprobe[wt,], design,correlation=0.8)
cont.matrix <- makeContrasts(severe-Control,mild-Control,severe-mild,levels=design)
fit2 <- contrasts.fit(fit, cont.matrix)
fit2 <- eBayes(fit2)
Dear Prof. Kevin Blighe, many thanks for your quick reply. About the correlation
value, I used the following code for getting the value.
corfit <- intraspotCorrelation(MA.reomoveCTprobe, design)
corfit$consensus
[1] 0.1659504
But it seems that it is lower than normal. So I just use the 0.8 for the furtheranalyse, is it incorrect? Do you think that filter to less than 100 genes may get more solid result from the whole array? Thanks again. Best wishes, Di
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Hi Rohit,
Please, how you did do to filter, I ask the same question?
Thanks for help,
Hi Rohit, Please, how you did do to filter, I ask the same question? Thanks for help,
sad, nothing improvement?