filtering low expressed gene in microarray for WGCNA
1
0
Entering edit mode
11 months ago
Fluke ▴ 10

Hi everyone

I have 2 dataset, one is RNA-seq data from TCGA and another is microarray data. For RNA-seq, I filter low expressed gene and normalization by using DESeq2

dds75<-dds[rowsums(counts(dds)>=10)>=475,]
vst(dds75)

Everything went well for RNA-seq but for microarray data I normalize data using gcRMA but i am not sure how to filter low expressed gene prior to do WGCNA because DESeq2 can’t apply to microarray array data.

microarray WGCNA DESeq2 • 838 views
ADD COMMENT
1
Entering edit mode
11 months ago

If you have followed a standard protocol for normalisation of the microarray, then no filtering is required for the purposes of preparing the data for WGCNA.

If you wish, you can still filter based on low intensity, by following this advice: https://bioconductor.org/packages/devel/workflows/vignettes/maEndToEnd/inst/doc/MA-Workflow.html#10_Filtering_based_on_intensity

You should also ensure that you remove the control probes from your dataset prior to running WGCNA.

Kevin

ADD COMMENT
0
Entering edit mode

Hi Kevin, first of all, thanks for your suggestion. I did follow your suggestion to not filter the gene and remove the control probe from the dataset prior to running WGCNA (as i understand the control probe name start with AFFY). the problem is that during runing WGCNA, i noticed the distribution of data to detect outliers using

gsg <- goodSamplesGenes(norm.counts)
summary(gsg)
gsg$allOK
{  if(sum(!gsg$goodGenes)>0)    
  printFlush(paste("Removing genes:", paste(names(norm.counts)[!gsg$goodGenes], collapse = ", ")));  
  if(sum(!gsg$goodSamples)>0)    
    printFlush(paste("Removing samples:", paste(rownames(norm.counts)[!gsg$goodSamples], collapse = ", ")));  
  norm.counts = norm.counts[gsg$goodSamples, gsg$goodGenes] 
}
sampleTree = hclust(dist(norm.counts), method = "average");
byHist = hist(sampleTree$height,main = "Histogram of Height",xlab = "Height")

The distribution is not a bell shape and reassure to detect outliers

plot(sampleTree, main = "Sample clustering to detect outliers", sub="", xlab="",
 cex.lab = 1.5,cex.axis = 1.5, cex.main = 2)

After that I exclude some samples above the height cut off and then run the pickSoftThreshold and get this result enter image description here Do you have any suggestion to fix this problem?

P.S. I used ReadAffy() and gcrma(Data) to extract and normalized expresion data and next I mapped the probe using hgu133plus2.db and exclude N/A probe and the control probe. Finally, I used avereps() function to average duplicated gene ID.

Thanks again for helping me.

ADD REPLY

Login before adding your answer.

Traffic: 1142 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6