Question

Finding DE genes in a group compared to all groups

3

Entering edit mode

6.6 years ago

ytlin610 ▴ 70

Hi,

I'm working on the DE analysis with DESeq2, here's my experimental design:

 Genotype       Time          Replicates
 Wild_type       6hr          3 for each
 Wild_type      10hr        
 Wild_type      12hr      
    cht7         6hr
    cht7        10hr      
    cht7        12hr  
   CHT7HA        6hr         
   CHT7HA       10hr                       
   CHT7HA       12hr

With the following code, I'm able to generate the result tables comparing two different groups of genotype+time (e.g. Wild_type_6hr_vs_cht7_6hr):

 Salmon_dds <-DESeqDataSetFromTximport(gene_quant, 
                                          colData =  sampleTable,
                                          design = ~genotype + time + time:genotype)

    Salmon_dds$genotype <- factor(Salmon_dds$genotype, levels= c("Wild_type","cht7", "CHT7HA"))
    Salmon_dds$time <- factor(Salmon_dds$time, levels = c("6","10","12"))
    Salmon_dds$group <-factor(paste0(Salmon_dds$genotype, Salmon_dds$time))
    design(Salmon_dds1) <- ~group

    results(Salmon_dds, contrast = c("group", "cht76", "Wild_type6"), tidy = TRUE)

If I understand correctly, the resulting table can help me to identify the DE genes in cht7_6 compared to wild_type_6.

However, I also want to pick a specific group and compare it to all of the groups to find the DE genes, e.g. genes in cht7_6hr which have a greater than 2-fold expression difference (and padj <0.05) from their mean expression of all samples (BaseMean). Is there any way to generate a table like this?

The reason for doing this is to to get a list of DE genes which are differentially expressed in at least one group across all groups so I can do the follow-up gene clustering.

Thanks!

DESeq2 rna-seq R • 2.2k views

ADD COMMENT • link 6.6 years ago by ytlin610 ▴ 70

1

Entering edit mode

Well I have also recently did analysis on interaction type of design in DESeq2, what I would do in this case, I would use contrast and perform comparison keeping one condition as reference (as you are doing) and extract all the genes that are DE (log2FC >=1 || log2FC <=-1 && padj <=0.05) and combine the list. Further, I would extract normalized count value either in terms of rld or vst and perform gene clustering. This is just an suggestion, others may suggest better answer.

ADD REPLY • link 6.6 years ago by Prakash ★ 2.2k

0

Entering edit mode

Thanks for the answer! It's a good idea to take one condition (e.g. wild type at 6hr) as the reference for others. The reason I wanted to do the one-vs-all (as described in my question) comparison is that there're a lot of papers using this method to get the DEGs for gene clustering, and they used DESeq2 to do that. Anyway, I'll try to get the DE genes as you suggested, thanks a lot!

ADD REPLY • link 6.6 years ago by ytlin610 ▴ 70

score 2 · Answer 1 · 2018-10-02

Updated: I finally figured out how to do it!

It's mentioned in this post: https://support.bioconductor.org/p/96016/

The key point is to add betaPrior = TRUE when running DESeq.

For example, by running:

Salmon_dds1 <-DESeq(Salmon_dds1, betaPrior = TRUE)

...will give results names as:

> resultsNames(Salmon_dds1)
 [1] "Intercept"        "groupcht710"      "groupcht712"      "groupcht76"       "groupCHT7HA10"   
 [6] "groupCHT7HA12"    "groupCHT7HA6"     "groupWild_type10" "groupWild_type12" "groupWild_type6"

The groups are separated successfully, and I can get the result of a single group (comparing to the BaseMean) instead of comparing to other groups:

> head(results(Salmon_dds1, name = "groupWild_type6"))
log2 fold change (MAP): groupWild type6 
Wald test p-value: groupWild type6 
DataFrame with 6 rows and 6 columns
                      baseMean log2FoldChange     lfcSE        stat       pvalue         padj
                     <numeric>      <numeric> <numeric>   <numeric>    <numeric>    <numeric>
Cre38.g759997.t1.1  17.4794969     0.43888426 0.6381224   0.6877744 4.915949e-01 6.126644e-01
Cre08.g358100.t1.1 302.3180818    -3.62919097 0.2562839 -14.1608215 1.601115e-45 6.873016e-44
Cre08.g358126.t1.1   0.2533769     0.13215216 0.3769825   0.3505525 7.259241e-01           NA
Cre08.g358150.t1.2   0.2666042    -0.04461779 0.3769895  -0.1183529 9.057881e-01           NA
Cre08.g358200.t1.1 470.3191587    -0.27601193 0.3454237  -0.7990533 4.242595e-01 5.483249e-01
Cre08.g358200.t2.1 210.2892166     0.13084803 0.6564644   0.1993224 8.420106e-01 8.905721e-01

The explanation of using betaPrior = TRUE is also provided in the post ( by Dr. Michael Love ).