Advice for the following PCA analysis
0
0
Entering edit mode
6.8 years ago
Mozart ▴ 330

Hello there, I am running RNA-seq analysis on the following data: I am comparing 4 different conditions (WT-treated, WT-untreated, KO-treated, KO-untreated) and I think the following PCA is affected by a batch effect.

red=KO-untreated
green=KO-treated
blue=WT-untreated
violet=WT-treated

enter image description here First of all, can you confirm that there might be this kind of bias? Secondly, how would you recommend to proceed?

RNA-Seq • 2.0k views
ADD COMMENT
0
Entering edit mode

Can u also provide the legend?

ADD REPLY
0
Entering edit mode

And can u please elaborate more on green and violet? They are both KO-treated, what is the difference between them?

ADD REPLY
0
Entering edit mode

Sorry, I have just edited the legend

ADD REPLY
0
Entering edit mode

Yes, there is clear 'bias' as evidenced by the variation explained by PC1. I put the word 'bias' in apostrophes because, by the off chance, there may be a biological explanation for the finding.

Were those samples processed on a different batch?; are they the KO or WT? There is no legend in your plot.

Edit: thanks for editing your post to define the groupings

ADD REPLY
0
Entering edit mode

very sorry about that. I have just edited the legend.

ADD REPLY
0
Entering edit mode

If they are just a different batch, then just include batch as a variable in the design model, assuming that you're running DESeq2. That will most likely mitigate the batch effect.

ADD REPLY
0
Entering edit mode

Hi Kevin, yep I have done that using sva.

For Kevin( hope he will read it, since I am not able to write another reply for the next 24 hours). So, let's see if I have understood your suggestion correctly. Instead of doing this:

dds <- DESeqDataSetFromTximport(txi.kallisto.tsv, sampleTable, ~batch1+batch2+batch3+condition)

You are suggesting me to type this(?):

 dds <- DESeqDataSetFromTximport(txi.kallisto.tsv, sampleTable, ~batch1+batch2+batch3)

Thanks for your help Kevin. I am afraid I have just one column with all the possible condition KO_CTL, KO_TRE, WT_CTL, WT_TRE. My resultsName(dds) is

[1] "Intercept" "condition_KO_TRE_vs_KO_CTL"  "condition_WT_CTL_vs_KO_TRE" [4] condition_WT_TRE_vs_KO_CTL"

probably, I am doing something wrong.

ADD REPLY
2
Entering edit mode

That seems to have improved it. Can you nevertheless just include batch as a covariate in the DESeq2 design model. I am almost certain that that will mitigate the effect that you see (if indeed those samples on the right-hand-side of your plot are from a different batch).

ADD REPLY
1
Entering edit mode

Hi, I can see your edited post. Why do you have 3 batch variables? There should be just a single batch variable. Your parameters should be something like this:

Batch   Treatment  Group
batch1  untreated  CTL
batch1  treated    LAP
batch2  treated    CTL
batch2  untreated  CTL
etc.

Then use:

~batch+Treatment+Group

You could also merge Treatment and Group into a single variable with paste(), if you wish.

ADD REPLY

Login before adding your answer.

Traffic: 2314 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6