Hi all
I have downloaded the raw counts of RNA-Seq from TCGA for DE analysis. the number of primary tumor samples was 223 and the normal adjacent tissue was 41. I have performed the normalization by DESeq2 in two ways:
first, I normalized all of the primary tumor samples(223) and normal samples(41), then I visualized them by volcano plot that you can see here:
Second, I separated the matched samples (41 tumor and 41 adjacent normal samples) and then normalized them. as you can see below the plot has changed significantly :
My question is that which one is correct and why? when I studied how DESeq2 normalizes data here I can't understand the effect of grouping in the normalization. I mean what is the effect of group size on normalization by DESeq2?
Thanks for any help.
Can you post the code for both workflows so that it's easier to understand what was done?
Dear Friederike,
thanks for your reply. When I wanted to post my code here, I found a bad mistake in my code that change all my results, so I've edited the second figure. However, my previous question still remains: which normalization is correct? here is my code:
First strategy
Second:
You included different samples, why would you expect the results to look the same?
You are right, they shouldn't be the same, but indeed I don't know which one is correct.
So, basically you're asking whether you should compare 223 primary tumor samples vs. 41 normal samples or whether you should focus on the matched samples (41 vs 41) only?
While neither one is THE correct solution, I'd say that comparing matched samples is preferable, but you should make use of the fact that they're matched and add the patient as a co-variable into your design, e.g.
~ patient + group