Pre-processing of RNA-Seq data for WGCNA
0
0
Entering edit mode
8.2 years ago
gokce.ouz ▴ 70

Hi All,

I have 2 experimental condition and each condition has 18 samples. However, they are sequenced in 3 batches. 1 st batch 6 pair (disease & control), 2nd batch 6 pair (disease & control), 3rd batch 6 pair (disease & control).

Each batch done by different people over different times so I believe we have batch effect. As I am using DESeq2 for my analysis I set my model design as “Batch+Type”. I used removeBatchEffect from limma package and plotted PCA & did hierarchical clustering using spermann correlation (I believe it is better to use Spearman rather than the Pearson??) to see which samples are outliers. Another issue I am facing is 1st & 2nd batches are 76 bp paired end, however the last batch is 101 bp paired end.I did not add this into the design separately.

I would like to do differential gene expression analysis and WGCA. So I have multiple questions:

  1. Should I use rlog() for differential gene expression analysis and getVarianceStabilizedData() for WGCNA? or can I use same normalization for both? If yes, which one do you suggest ?
  2. Should I remove any samples according to these PCA plots & dendogram or just do analysis as I already included batch effect into the model design as ~Batch+Type. Or is it like for DGE analysis I do not to remove any sample but for WGCNA I should remove?

For both normalization type it looks like I have to remove the Patient-Sph4 as it look has distance from others in PCA plot also (the one at the middletop)

I really appreciate if you give me some insight,

Thanks in advance,

ddsMat.pre<- DESeqDataSetFromMatrix(seMat.pre, sample.table.ERSvsPT.pre, design=~Batch+Type)
dds.pre<-DESeq(ddsMat.pre)
dds.pre <- dds.pre[ rowSums(counts(dds.pre)) > 1, ]
datExpr0<- assay(dds.pre)
gsg = goodSamplesGenes(datExpr0, verbose = 5);
gsg$allOK

RLD

rld <- rlog(dds.pre, blind= FALSE)
rlogMat <- assay(rld)
save(rld, file="RLD_Pre_Type+Batch")

design=model.matrix(~sample.table.ERSvsPT.pre$Type)
removedbatch.rld=removeBatchEffect(rlogMat, batch=sample.table.ERSvsPT.pre$Batch, design= design )
write.csv(removedbatch.rld, file="removedbatch.rld_36patient")
rld.removed<-rld
assay(rld.removed)<-removedbatch.rld

par(mfrow = c(1,2));
cex1 = 1.5;
plotPCA(rld.batch, intgroup=c("Type", "Batch"))
plotPCA(rld.removed, intgroup=c("Type", "Batch"))

colData(rld.removed)$Type<-as.factor(colData(rld.removed)$Type)
colData(rld.removed)$Batch<-as.factor(colData(rld.removed)$Batch)
data.removed<-plotPCA(rld.removed, intgroup=c("Batch","Type"), returnData=TRUE)
data.removed$name<-colData(rld.removed)$Patient
percentVar.removed <- round(100 * attr(data.removed, "percentVar"))
p2<-ggplot(data.removed, aes(PC1, PC2, color=Batch, shape=Type, label= name)) + geom_text(size=5,hjust = 0, nudge_x = 0.05) + geom_point(size=3) + xlab(paste0("PC1: ",percentVar.removed[1],"% variance")) +
  ylab(paste0("PC2: ",percentVar.removed[2],"% variance")) + ggtitle("PCA of RLD_pre_Type+Batch ") + coord_fixed()

mat.rld.removed<- t(assay(rld.removed))
rownames(mat.rld.removed) <- paste( rld.removed$Written.ID)
plot(hcluster(dist(mat.rld.removed), method= "spearman", link="average" ), cex=1.5, cex.main=2, main= "Hierarchical Clustering of  RLD Normalized Count Data ")
dev.copy(png, "Sample clustering (hcluster) of RLD_Pre_Type+Batch.png")
dev.off()

VSD

vsd = getVarianceStabilizedData(dds.pre)
save(vsd, file="VSD_Pre_Type+Batch")

design=model.matrix(~sample.table.ERSvsPT.pre$Type)
vsd.removed=removeBatchEffect(vsd, batch=sample.table.ERSvsPT.pre$Batch, design= design )

colnames(vsd.removed) <-  rld.removed$Written.ID
plot(hcluster(dist(t(vsd.removed)), method= "spearman", link="average" ), cex=1.5, cex.main=2, main= "Hierarchical Clustering of VSD Normalized Count Data ")
dev.copy(png, "Sample clustering (hcluster) of vsd_Pre_Type+Batch.png")
dev.off()

PCA plot after rld() & removeBatchEffect():

image: PCA plot after `rld()` & `removeBatchEffect()`

Hierarchical Clustering Dendogram after rld() & removeBatchEffect():

image: Hierarchical Clustering Dendogram after `rld()` & `removeBatchEffect()`

Hierarchical Clustering Dendogram after vsd() & removeBatchEffect():

image: Hierarchical Clustering Dendogram after `vsd()` & `removeBatchEffect()`

RNA-Seq WGCNA removeBatchEffect DESeq2 • 6.5k views
ADD COMMENT
0
Entering edit mode

What type of data did you feed into removeBatchEffect? - Normalised counts, rlog transformed, vst transformed, voom'd?

ADD REPLY
0
Entering edit mode

I tried both rlog &vsd. The 1st dendogram = the rlog transformed and batch removed, 2nd dendogram from the vsd transformed and batch removed.

ADD REPLY
0
Entering edit mode

Try Voom Transforming - That way you know that the removeBatchEffect function is getting what it expects.

ADD REPLY
0
Entering edit mode

Hi Andrew, thanks for the suggestion; but I read that voom transformed data is not applicable for WGCNA analysis. So if we assume I did the vsd normalization, do you suggest me to remove any samples ?

ADD REPLY
0
Entering edit mode

Where did you read that? - Voom transforms RNA seq counts to "microarray-like" data that should be fine with WGCNA. If you want to batch correct, and you want to use Limma's removeBatchEffect (which I think is best in your current situation), then try and get the data to a point that Limma expects - Voom is best in that regard. See point 4 here

ADD REPLY
0
Entering edit mode

Thanks a lot; I will implement it into my analysis immediately.

ADD REPLY
0
Entering edit mode

Hi Andrew,

Sorry to disturb but may I ask a few quick questions ? While transforming the count data with voom()

1 - For design I am using both "~ Batch + Type", is it the correct approach ?

2 - Should I use TMM-normalization before using voom ()

voomplot after TMM normalization Hierarchical clustering after removedBatchEffect

3 - Should directly input the raw count data to voom () voom plot Hierarchical clustering after removedBatchEffect

4 - Even though I deleted the rows with zero, my voom plots do not look like the plots in this paper. Am I doing something wrong ?

5 - Do you think it is better if I do differential gene expression analysis also using Limma's commands?

I really appreciate if you give me some insight,

Thanks in advance,

ADD REPLY

Login before adding your answer.

Traffic: 2652 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6