Question

DEG for multiple comparisons

0

Entering edit mode

3.3 years ago

randlkc • 0

Hi all,

I'm quite new to using DESeq2 and was hoping for some guidance. I appreciate my question might be very basic - apologies for this.

I have an experimental set of six groups (24 samples in total, n = 4) and am looking at creating different comparisons. I've read the DESeq2 vignette and have made my contrast like so:

 dds <- DESeqDataSetFromTximport(Txi_gene, colData=sample_file, design=~group)
 dds <- DESeq(dds)
 dds$group <- factor(paste0(dds$condition, dds$time))
 design(dds) <- ~ group

res1 <- results(dds, contrast=c("group","A_day14","B_day14"))
res2 <- results(dds, contrast=c("group","A_day14","A_day0"))

*etc.*

This all seems to be looking as expected. My next question is about visualizing the differentially expressed genes: if I'm looking at comparison A vs B, how can I extract just those DEGs rather than the genes in general?

This is my code chunk so far:

normalized_counts <- counts(dds, normalized=TRUE)

normalized_counts <- normalized_counts %>% 
  data.frame(check.names=FALSE) %>%
  rownames_to_column(var="gene") %>% 
  as_tibble()

top50_sigRes1_genes <- res1_lfcshrink_tb %>% 
        arrange(padj) %>% 
        pull(gene) %>%
        head(n=50)

top50_sigRes1_norm <- normalized_counts %>%
        dplyr::filter(gene %in% top50_sigRes1_genes)

gathered_top50_sigRes1 <- top50_sigRes1_norm %>%
  gather(colnames(top50_sigRes1_norm)[2:25], key = "samplename", value = "normalized_counts")

My hunch is that I need to change the [2:25] to reflect only the columns I'm interested in per comparison? So if I look at A vs B, I only select those columns and disregard C, D, etc. However, when I try that I get the error ! incorrect number of dimensions.

So for visualization:

gathered_top50_sigRes1_joined <- inner_join(meta_data, gathered_top50_sigRes1)

ggplot(gathered_top50_sigRes1_joined) +
        geom_point(aes(x = gene, y = normalized_counts, color = group)) +
        scale_y_log10() +
        xlab("Genes") +
        ylab("log10 Normalized Counts") +
        ggtitle("Top 50 Significant DE Genes - A day 14 vs B day 14") +
        theme_bw() +
    theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
    theme(plot.title = element_text(hjust = 0.5))

This gives me a nice plot, but all my samples are included in it so it's not specific to the comparison I'm looking at.

I hope my question makes sense and thank you in advance for taking the time to help me!

expression DESEq2 differential RNASeq DEG • 1.3k views

ADD COMMENT • link 3.3 years ago by randlkc • 0

0

Entering edit mode

I only select those columns and disregard C, D, etc. However, when I try that I get the error ! incorrect number of dimensions.

Can you share the line of code that fails for this?

ADD REPLY • link 3.3 years ago by Friederike 9.0k

0

Entering edit mode

Hi Friederike, thanks for your reply! Please see the code:

 gathered_top50_sigRes1 <- top50_sigRes1_norm %>%
 gather(colnames(top50_sigRes1_norm)[2:3,5,7,11], key = "samplename", value = "normalized_counts")

So here I've indicated columns 3,5,7,11 because those are the samples I'm interested in. This gives me the following error:

Error in `colnames(top50_sigRes1_norm)[2:3, 5, 7, 11]`:
! incorrect number of dimensions
Backtrace:
  1. top50_sigRes1_norm %>% ...
 22. base::.handleSimpleError(...)
 23. rlang h(simpleError(msg, call))
 24. handlers[[1L]](cnd)

This is what my top50_sigRes1_norm looks like:

> head(top50_sigRes1_norm)
# A tibble: 6 × 25
  gene     R1ESD0S8 R1ESD14S5 R1FWD0S24 R1FWD14S21 R1TCPD0S22 R1TCPD14S12 R2ESD0S16 R2ESD14S9 R2FWD0S13 R2FWD14S23 R2TCPD0S20 R2TCPD14S10 R3ESD0S11 R3ESD14S6
  <chr>       <dbl>     <dbl>     <dbl>      <dbl>      <dbl>       <dbl>     <dbl>     <dbl>     <dbl>      <dbl>      <dbl>       <dbl>     <dbl>     <dbl>
1 ACTN1     11878.     14667.     8403.     6059.     10734.       8492.     8585.    15186.     9995.      7586.      14920.       7357.    12047.    13620.
2 ALDH16A1    342.       338.      297.      468.       269.        275.      326.      324.      365.       473.        283.        308.      331.      360.
3 AP3D1       157.       100.      286.      678.       150.         71.4     133.       84.1     315.       444.        139.        103.      142.      144.
4 BBOF1       158.       179.      193.       55.4      113.         84.2      94.5      85.1     109.        74.3       108.        159.      108.      153.
5 C1QTNF5     380.       486.      511.      179.       386.        258.      374.      213.      399.       180.        447.        320.      316.      299.
6 C22orf46     39.1        0         0         0         39.1         0         0        99.6      44.2        0           0         120.        0       205.

ADD REPLY • link 3.3 years ago by randlkc • 0

1

Entering edit mode

colnames(top50_sigRes1_norm)[2:3,5,7,11] needs to be colnames(top50_sigRes1_norm)[c(2:3,5,7,11)]

ADD REPLY • link 3.3 years ago by Friederike 9.0k

0

Entering edit mode

Thanks Frederieke. It seems to be the right direction, however, now my gathered_top50_sigRes1 output includes everything except those columns I specified?

Edit: in the downstream code this seems to have achieved exactly what I wanted! Thank you!

ADD REPLY • link 3.3 years ago by randlkc • 0