Question

Pre-processing of RNA-Seq data for WGCNA

0

Entering edit mode

8.2 years ago

gokce.ouz ▴ 70

Hi All,

I have 2 experimental condition and each condition has 18 samples. However, they are sequenced in 3 batches. 1 st batch 6 pair (disease & control), 2nd batch 6 pair (disease & control), 3rd batch 6 pair (disease & control).

Each batch done by different people over different times so I believe we have batch effect. As I am using DESeq2 for my analysis I set my model design as “Batch+Type”. I used removeBatchEffect from limma package and plotted PCA & did hierarchical clustering using spermann correlation (I believe it is better to use Spearman rather than the Pearson??) to see which samples are outliers. Another issue I am facing is 1st & 2nd batches are 76 bp paired end, however the last batch is 101 bp paired end.I did not add this into the design separately.

I would like to do differential gene expression analysis and WGCA. So I have multiple questions:

Should I use rlog() for differential gene expression analysis and getVarianceStabilizedData() for WGCNA? or can I use same normalization for both? If yes, which one do you suggest ?
Should I remove any samples according to these PCA plots & dendogram or just do analysis as I already included batch effect into the model design as ~Batch+Type. Or is it like for DGE analysis I do not to remove any sample but for WGCNA I should remove?

For both normalization type it looks like I have to remove the Patient-Sph4 as it look has distance from others in PCA plot also (the one at the middletop)

I really appreciate if you give me some insight,

Thanks in advance,

ddsMat.pre<- DESeqDataSetFromMatrix(seMat.pre, sample.table.ERSvsPT.pre, design=~Batch+Type)
dds.pre<-DESeq(ddsMat.pre)
dds.pre <- dds.pre[ rowSums(counts(dds.pre)) > 1, ]
datExpr0<- assay(dds.pre)
gsg = goodSamplesGenes(datExpr0, verbose = 5);
gsg$allOK

RLD

rld <- rlog(dds.pre, blind= FALSE)
rlogMat <- assay(rld)
save(rld, file="RLD_Pre_Type+Batch")

design=model.matrix(~sample.table.ERSvsPT.pre$Type)
removedbatch.rld=removeBatchEffect(rlogMat, batch=sample.table.ERSvsPT.pre$Batch, design= design )
write.csv(removedbatch.rld, file="removedbatch.rld_36patient")
rld.removed<-rld
assay(rld.removed)<-removedbatch.rld

par(mfrow = c(1,2));
cex1 = 1.5;
plotPCA(rld.batch, intgroup=c("Type", "Batch"))
plotPCA(rld.removed, intgroup=c("Type", "Batch"))

colData(rld.removed)$Type<-as.factor(colData(rld.removed)$Type)
colData(rld.removed)$Batch<-as.factor(colData(rld.removed)$Batch)
data.removed<-plotPCA(rld.removed, intgroup=c("Batch","Type"), returnData=TRUE)
data.removed$name<-colData(rld.removed)$Patient
percentVar.removed <- round(100 * attr(data.removed, "percentVar"))
p2<-ggplot(data.removed, aes(PC1, PC2, color=Batch, shape=Type, label= name)) + geom_text(size=5,hjust = 0, nudge_x = 0.05) + geom_point(size=3) + xlab(paste0("PC1: ",percentVar.removed[1],"% variance")) +
  ylab(paste0("PC2: ",percentVar.removed[2],"% variance")) + ggtitle("PCA of RLD_pre_Type+Batch ") + coord_fixed()

mat.rld.removed<- t(assay(rld.removed))
rownames(mat.rld.removed) <- paste( rld.removed$Written.ID)
plot(hcluster(dist(mat.rld.removed), method= "spearman", link="average" ), cex=1.5, cex.main=2, main= "Hierarchical Clustering of  RLD Normalized Count Data ")
dev.copy(png, "Sample clustering (hcluster) of RLD_Pre_Type+Batch.png")
dev.off()

VSD

vsd = getVarianceStabilizedData(dds.pre)
save(vsd, file="VSD_Pre_Type+Batch")

design=model.matrix(~sample.table.ERSvsPT.pre$Type)
vsd.removed=removeBatchEffect(vsd, batch=sample.table.ERSvsPT.pre$Batch, design= design )

colnames(vsd.removed) <-  rld.removed$Written.ID
plot(hcluster(dist(t(vsd.removed)), method= "spearman", link="average" ), cex=1.5, cex.main=2, main= "Hierarchical Clustering of VSD Normalized Count Data ")
dev.copy(png, "Sample clustering (hcluster) of vsd_Pre_Type+Batch.png")
dev.off()

PCA plot after rld() & removeBatchEffect():

image: PCA plot after `rld()` & `removeBatchEffect()`

Hierarchical Clustering Dendogram after rld() & removeBatchEffect():

image: Hierarchical Clustering Dendogram after `rld()` & `removeBatchEffect()`

Hierarchical Clustering Dendogram after vsd() & removeBatchEffect():

image: Hierarchical Clustering Dendogram after `vsd()` & `removeBatchEffect()`

RNA-Seq WGCNA removeBatchEffect DESeq2 • 6.5k views

ADD COMMENT • link updated 7 months ago by Ram 44k • written 8.2 years ago by gokce.ouz ▴ 70

0

Entering edit mode

What type of data did you feed into removeBatchEffect? - Normalised counts, rlog transformed, vst transformed, voom'd?

ADD REPLY • link 8.2 years ago by andrew.j.skelton73 6.6k

0

Entering edit mode

I tried both rlog &vsd. The 1st dendogram = the rlog transformed and batch removed, 2nd dendogram from the vsd transformed and batch removed.

ADD REPLY • link 8.2 years ago by gokce.ouz ▴ 70

0

Entering edit mode

Try Voom Transforming - That way you know that the removeBatchEffect function is getting what it expects.

ADD REPLY • link 8.2 years ago by andrew.j.skelton73 6.6k

0

Entering edit mode

Hi Andrew, thanks for the suggestion; but I read that voom transformed data is not applicable for WGCNA analysis. So if we assume I did the vsd normalization, do you suggest me to remove any samples ?

ADD REPLY • link 8.2 years ago by gokce.ouz ▴ 70

0

Entering edit mode

Where did you read that? - Voom transforms RNA seq counts to "microarray-like" data that should be fine with WGCNA. If you want to batch correct, and you want to use Limma's removeBatchEffect (which I think is best in your current situation), then try and get the data to a point that Limma expects - Voom is best in that regard. See point 4 here

ADD REPLY • link 8.2 years ago by andrew.j.skelton73 6.6k

0

Entering edit mode

Thanks a lot; I will implement it into my analysis immediately.

ADD REPLY • link 8.2 years ago by gokce.ouz ▴ 70

0

Entering edit mode

Hi Andrew,

Sorry to disturb but may I ask a few quick questions ? While transforming the count data with voom()

1 - For design I am using both "~ Batch + Type", is it the correct approach ?

2 - Should I use TMM-normalization before using voom ()

voomplot after TMM normalization Hierarchical clustering after removedBatchEffect

3 - Should directly input the raw count data to voom () voom plot Hierarchical clustering after removedBatchEffect

4 - Even though I deleted the rows with zero, my voom plots do not look like the plots in this paper. Am I doing something wrong ?

5 - Do you think it is better if I do differential gene expression analysis also using Limma's commands?

I really appreciate if you give me some insight,

Thanks in advance,

ADD REPLY • link 8.2 years ago by gokce.ouz ▴ 70