Batch effect correction - miRNA sequencing - DESeq2
1
0
Entering edit mode
6.0 years ago
prekrish • 0

Hi,

First of all, I apologize for a long post .

I am analyzing miRNA sequencing data generated from human samples. I have 10 paired samples - control and treated. These samples were sequenced in two batches - 4 pairs in one batch and 6 pairs in another batch. The objective is to identify differentially expressed miRNAs between control and treated groups. The experimental design is as follows:

Sample_ID                   Condition       Patient     Batch
IonXpress_RNA_001   Healthy          A         1
IonXpress_RNA_002   Treated          A         1
IonXpress_RNA_003   Healthy          B         1
IonXpress_RNA_004   Treated      B         1
IonXpress_RNA_005   Healthy          C         1
IonXpress_RNA_006   Treated      C         1
IonXpress_RNA_007   Healthy          D         1
IonXpress_RNA_008   Treated      D         1
IonXpress_RNA_011   Healthy          E         2
IonXpress_RNA_012   Treated      E         2
IonXpress_RNA_017   Healthy          F         2
IonXpress_RNA_018   Treated      F         2
IonXpress_RNA_021   Healthy         G          2
IonXpress_RNA_022   Treated     G          2
IonXpress_RNA_023   Healthy         H          2
IonXpress_RNA_024   Treated     H          2
IonXpress_RNA_025   Healthy          I             2
IonXpress_RNA_026   Treated          I         2
IonXpress_RNA_027   Healthy         J          2
IonXpress_RNA_028   Treated         J          2

I tried to analyze the data using DESeq2. I checked to see if the samples cluster based on batch and yes, the PCA plot clearly showed two distinct clusters - batch 1 and batch 2. I used variance stabilized counts for this purpose. Then, I removed batch effects using assay(vsd)<-limma::removeBatchEffect(assay(vsd), vsd$Batch) . PCA plot now showed as a single cluster and not as two distinct clusters.

To identify differentially expressed miRNAs, I used <-DESeqDataSetFromMatrix(countData=cts, colData=coldata, design=~Patient + Condition). Since this is a paired sample analysis, I used the variable 'patient' in my design. I did get a list of differentially exprssed miRNAs. However, when I tried to include "batch" in the design, I am getting the following error message -

invalid class “DESeqDataSet” object: the model matrix is not full rank, i.e. one or more variables in the design formula are linear combinations of the others

I am trying to figure out the reason for this error - is this because the conditions (healthy and treated) are overlapping between both the batches?

Is there a way that I can rectify this ?

Should I change my design ?

If I do not correct for batch effects or remove the batch effects, will it drastically change my output? I understand that batch effects can lead to spurious results. However, I am asking this because ~ 75% of the differentially expressed miRNAs obtained from my analysis have got validated in RT-PCR.

I understand that the input data has to be raw counts for DESeq2 but is there a way that I can do a differential expression analysis for batch corrected dataset and compare the results with the uncorrected dataset? This is for learning purpose.

Thank you very much

Best regards,
Preethi

Batch-effect DESeq2 miRNA-Seq • 2.3k views
ADD COMMENT
3
Entering edit mode
6.0 years ago

If each patient is in only one batch, you cannot correct for both patient and batch. You cannot include both in your design. You might as well look at your samples with PCA, see how glaring the batch effect is.

ADD COMMENT
0
Entering edit mode

Thank you very much for your reply. I did do a PCA and the difference looked glaring. Both the batches separated as two clusters. I am not sure how to share the image here. But yes, they separated as two clusters and when I tried PCA plot using batch effects removed variance stabilized counts, the batches merged as one cluster.

ADD REPLY

Login before adding your answer.

Traffic: 2541 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6