Strange RNA-Seq Results When Adding Covariates
0
0
Entering edit mode
12 months ago

Hi all,

I am performing an RNA-sequencing experiment using primary human fibroblasts. The study is a simple case/control study with a treated and untreated condition.

I have begun my analyses by comparing case/control while disregarding treatment. When I don't include covariates I get 90 significant genes out of a count matrix of ~26k. These seem to make biological sense.

However, when I include age as a covariate, I suddenly get almost 9k significant genes. I have also generated histograms of the age distribution in the cases vs controls and they seem to be relatively similar.

It is also worth noting that HOX genes are a confounder in my experiment. However, I am unsure why they cropped up as differentially expressed when accounting for age but when I didn't include age in the model.

Would you have any ideas as to why this has happened?

Code below written in rstudio.

Analysis Without Covariate

dds <- DESeqDataSetFromMatrix(counts, meta, design = ~condition)
dds <- DESeq(dds)

contrast_condition <- c("condition", "case", "control")

#Extract results
res_table <- results(dds, contrast = contrast_condition, alpha = 0.05)

#Apply LFC shrinkage 
resultsNames(dds)
res_table <- lfcShrink(dds, coef = "condition_case_vs_control", type = "apeglm")

res_table <- res_table %>%
  data.frame() %>%
  rownames_to_column(var = "gene") %>%
  tibble()

#Extract significant results
sig_res <- res_table %>%
  dplyr::filter(padj < 0.05)

Top Sig Genes No Covariate

no covariate

Analysis with Age as A Covariate

#Scale and centre the age variable as numeric variables with high mean can impair the model fit
meta$age_months_scaled <- scale(meta$age_months, center = TRUE)

#Remove NAs from age$months as DESeq can't handle NAs
meta_na.rm <- meta[!is.na(meta$age_months), ]

counts_na.rm <- counts[ , meta_na.rm$sample_name]

#Run the analysis on DESeq dataset
dds_age <- DESeqDataSetFromMatrix(counts_na.rm, meta_na.rm, design = ~age_months_scaled + condition)

dds_age <- DESeq(dds_age)

#Define contrast
contrast_condition <- c("condition", "case", "control")

#Extract results
res_table_age <- results(dds_age, contrast = contrast_condition, alpha = 0.05)

#Apply LFC shrinkage 
resultsNames(dds_age)
res_table_age <- lfcShrink(dds_age, coef = "condition_case_vs_control", type = "apeglm")
#No LFC shrinkage errors

res_table_age <- res_table_age %>%
  data.frame() %>%
  rownames_to_column(var = "gene") %>%
  tibble()

#Extract significant results
sig_res_age <- res_table_age %>%
  dplyr::filter(padj < 0.05)

Top Sig Genes With Covariate

enter image description here

covariates RNA-Seq DESeq2 • 536 views
ADD COMMENT
1
Entering edit mode

It would probably be worth doing a bit of data exploration with e.g. PCA to better understand the factors contributing to the variance in your data. This becomes more important as you start adding covariates to a model.

ADD REPLY

Login before adding your answer.

Traffic: 1977 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6