Padj Values Problem For Multiple Samples in DESeq2
1
0
Entering edit mode
3.8 years ago
Aynur ▴ 60

Dear Biostars,

I performed DEG analysis for mouse RNA-Seq data using DESeq2. I have the following problem. I hope you can help me. I have 3 different treatments and one control group. Each condition has 2 biological replicates. I have the following code.

## generate a metadata     condition <- c('control','control','lif','lif','il6','il6', 'cytokine','cytokine').                                    
sample <- c('con1','con2','lif1','lif2', 'il6.1','il6.2','cyto1','cyto2')
Metadata <- data.frame(sample,condition)
head(data)
                               con1   con2   lif1   lif2  il6.1   il6.2   cyto1    cyto2
   ENSMUSG00000000001   3110   3141   3299   3148   3698 3214   3239   3402

I am not showing my codes to make dds object, as it is hard for me to format codes here. In summary, my code worked, and I tried to get results table using the above data with 8 samples. For result, I used the following code , by changing "il6" to other conditions.

IL6DEseqresult <- as.data.frame(results(dds, c("condition",
                                          "il6","control"), alpha = 0.05))

However, when I performed the same analysis starting with less number of samples I got very different padj, which affected my significant gene lists. I mean I started by subsetting two contrasting groups, and then get the dds object with control, and lif group. In this was I just had four samples (control1, control2, lif1, lif2), and I repeated the same for other groups. I got very different padj value. So , I am not sure , which strategy is a correct way. Can you please help me?

Thank you

sequencing RNA-seq R • 1.3k views
ADD COMMENT
0
Entering edit mode

I am not showing my codes to make dds object, as it is hard for me to format codes here.

it's not that hard though: you can find it in the formatting bar (the code option) Select your code and press that button.
code_formatting

ADD REPLY
1
Entering edit mode
3.8 years ago
ATpoint 85k

Using different number of samples affects normalization and dispersion estimation, so seeing different results is normal and expected. Whether you want to make two separate dds objects or a single one depends on both the question you want to answer and on how the samples behave. If the samples are from the same experiment and behave similarily (e.g. in a PCA) then it is probably a good idea to make a single object and then use the contrast argument to get the pairwise comparisons you want. If the treatment groups behave very differently (like IL6 e.g. had huge dispersion and Lif had not) then it might make sense to conduct separate analysis.

I would start by checking how the data look. Make a single dds objects, run vst and conduct a PCA as described in the manual. You can post the plot here if you want to get feedback.

ADD COMMENT
0
Entering edit mode

Here is the PCA Plot for all samples. cytokine sample was labelled osm FYI.

enter image description here

ADD REPLY
1
Entering edit mode

Looks fairly similar in terms of how replicates cluster together. Probably the simplest to keep all samples in one dds and then contrast groups as needed.

ADD REPLY
0
Entering edit mode

Thank you very much for the kind help.

ADD REPLY
0
Entering edit mode

I am sorry, this p-value problem might be the easy one, but I am as a biologist having difficulty explaining it to my PI(biologist). Here is my problem. I followed this code by changing treatment names.

res <- results( dds, contrast = c("treatment", "DPN", "Control") )
res
resSig <- res[ which(res$padj < 0.05 ), ]

So, my question is my PI is telling me for the significant gene I should have the same p-value, and padj in res and resSig. He thinks we could also just filter the data by p-value or padj value on Excel. I ran result function a few times , but I do not get the same p-value and padj for the sig gene. How can I explain this ? or as he is saying am I doing something wrong here ? Thank you for your help.

ADD REPLY
0
Entering edit mode

Hello, Thank you for the help. Can you help me with the following padj value problem? Is there multiple testing going on from res to resSig ? Thank you very much.

ADD REPLY
0
Entering edit mode

I do not understand the question, please try to rephrase / explain better.

ADD REPLY
0
Entering edit mode

Hello, Here is my question. After I get significant gene list with padj < 0.05 cutoff, I see that stats(stat, p-value, padj )have changed. But, I was expecting that I should have the same stats for the same gene, and I just filtered these genes out of whole genes just based on padj value. Did I misunderstand here ? Is there anything wrong in getting significant genes just based on padj ?

Thank you very much.

ADD REPLY

Login before adding your answer.

Traffic: 1719 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6