DEG for multiple comparisons
0
0
Entering edit mode
2.7 years ago
randlkc • 0

Hi all,

I'm quite new to using DESeq2 and was hoping for some guidance. I appreciate my question might be very basic - apologies for this.

I have an experimental set of six groups (24 samples in total, n = 4) and am looking at creating different comparisons. I've read the DESeq2 vignette and have made my contrast like so:

 dds <- DESeqDataSetFromTximport(Txi_gene, colData=sample_file, design=~group)
 dds <- DESeq(dds)
 dds$group <- factor(paste0(dds$condition, dds$time))
 design(dds) <- ~ group

res1 <- results(dds, contrast=c("group","A_day14","B_day14"))
res2 <- results(dds, contrast=c("group","A_day14","A_day0"))

*etc.*

This all seems to be looking as expected. My next question is about visualizing the differentially expressed genes: if I'm looking at comparison A vs B, how can I extract just those DEGs rather than the genes in general?

This is my code chunk so far:

normalized_counts <- counts(dds, normalized=TRUE)

normalized_counts <- normalized_counts %>% 
  data.frame(check.names=FALSE) %>%
  rownames_to_column(var="gene") %>% 
  as_tibble()

top50_sigRes1_genes <- res1_lfcshrink_tb %>% 
        arrange(padj) %>% 
        pull(gene) %>%
        head(n=50)

top50_sigRes1_norm <- normalized_counts %>%
        dplyr::filter(gene %in% top50_sigRes1_genes)

gathered_top50_sigRes1 <- top50_sigRes1_norm %>%
  gather(colnames(top50_sigRes1_norm)[2:25], key = "samplename", value = "normalized_counts")

My hunch is that I need to change the [2:25] to reflect only the columns I'm interested in per comparison? So if I look at A vs B, I only select those columns and disregard C, D, etc. However, when I try that I get the error ! incorrect number of dimensions.

So for visualization:

gathered_top50_sigRes1_joined <- inner_join(meta_data, gathered_top50_sigRes1)

ggplot(gathered_top50_sigRes1_joined) +
        geom_point(aes(x = gene, y = normalized_counts, color = group)) +
        scale_y_log10() +
        xlab("Genes") +
        ylab("log10 Normalized Counts") +
        ggtitle("Top 50 Significant DE Genes - A day 14 vs B day 14") +
        theme_bw() +
    theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
    theme(plot.title = element_text(hjust = 0.5))

This gives me a nice plot, but all my samples are included in it so it's not specific to the comparison I'm looking at.

I hope my question makes sense and thank you in advance for taking the time to help me!

expression DESEq2 differential RNASeq DEG • 981 views
ADD COMMENT
0
Entering edit mode

I only select those columns and disregard C, D, etc. However, when I try that I get the error ! incorrect number of dimensions.

Can you share the line of code that fails for this?

ADD REPLY
0
Entering edit mode

Hi Friederike, thanks for your reply! Please see the code:

 gathered_top50_sigRes1 <- top50_sigRes1_norm %>%
 gather(colnames(top50_sigRes1_norm)[2:3,5,7,11], key = "samplename", value = "normalized_counts")

So here I've indicated columns 3,5,7,11 because those are the samples I'm interested in. This gives me the following error:

Error in `colnames(top50_sigRes1_norm)[2:3, 5, 7, 11]`:
! incorrect number of dimensions
Backtrace:
  1. top50_sigRes1_norm %>% ...
 22. base::.handleSimpleError(...)
 23. rlang h(simpleError(msg, call))
 24. handlers[[1L]](cnd)

This is what my top50_sigRes1_norm looks like:

> head(top50_sigRes1_norm)
# A tibble: 6 × 25
  gene     R1ESD0S8 R1ESD14S5 R1FWD0S24 R1FWD14S21 R1TCPD0S22 R1TCPD14S12 R2ESD0S16 R2ESD14S9 R2FWD0S13 R2FWD14S23 R2TCPD0S20 R2TCPD14S10 R3ESD0S11 R3ESD14S6
  <chr>       <dbl>     <dbl>     <dbl>      <dbl>      <dbl>       <dbl>     <dbl>     <dbl>     <dbl>      <dbl>      <dbl>       <dbl>     <dbl>     <dbl>
1 ACTN1     11878.     14667.     8403.     6059.     10734.       8492.     8585.    15186.     9995.      7586.      14920.       7357.    12047.    13620.
2 ALDH16A1    342.       338.      297.      468.       269.        275.      326.      324.      365.       473.        283.        308.      331.      360.
3 AP3D1       157.       100.      286.      678.       150.         71.4     133.       84.1     315.       444.        139.        103.      142.      144.
4 BBOF1       158.       179.      193.       55.4      113.         84.2      94.5      85.1     109.        74.3       108.        159.      108.      153.
5 C1QTNF5     380.       486.      511.      179.       386.        258.      374.      213.      399.       180.        447.        320.      316.      299.
6 C22orf46     39.1        0         0         0         39.1         0         0        99.6      44.2        0           0         120.        0       205.
ADD REPLY
1
Entering edit mode

colnames(top50_sigRes1_norm)[2:3,5,7,11] needs to be colnames(top50_sigRes1_norm)[c(2:3,5,7,11)]

ADD REPLY
0
Entering edit mode

Thanks Frederieke. It seems to be the right direction, however, now my gathered_top50_sigRes1 output includes everything except those columns I specified?

Edit: in the downstream code this seems to have achieved exactly what I wanted! Thank you!

ADD REPLY

Login before adding your answer.

Traffic: 1468 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6