DEseq2 experiment design
1
0
Entering edit mode
7 months ago

I was writing a deseq2 script to analyze wild type p53 and mutant p53 samples, my problem is that my wt sample contains 3 samples with 3,3 and 2 replicates and my mut samples contains 4 samples with 3 replicates each, my concern is, should I be using same number of samples and replicates for both the conditions or can I somehow use this design too?

# Load DESeq2 library
library(DESeq2)

# Read count data for all samples
wt_samples <- c("SRR8435995", "SRR8435996", "SRR8435997",
                "SRR19159298", "SRR19159299", "SRR19159300",
                "SRR24572364", "SRR24572365")

mut_samples <- c("SRR8435992", "SRR8435993", "SRR8435994",
                 "SRR22192978", "SRR22192979", "SRR22192981",
                 "SRR22729521", "SRR22729522", "SRR22729523",
                 "SRR24442519", "SRR24442520", "SRR24442521")

# Read count data for all samples
wt_counts <- lapply(wt_samples, function(sample_id) {
  read.delim(paste0(sample_id, ".csv"), row.names = 1)
})

mut_counts <- lapply(mut_samples, function(sample_id) {
  read.delim(paste0(sample_id, ".csv"), row.names = 1)
})

# Combine replicates for each sample
wt_combined <- Reduce("+", wt_counts)
mut_combined <- Reduce("+", mut_counts)

# Create sample metadata
wt_replicates <- c(3, 3, 2)  # Number of replicates for each wild type sample
mut_replicates <- rep(3, 4)   # Number of replicates for each mutant sample

# Create sample metadata
sample_metadata <- data.frame(
  sampleName = c(rep(wt_samples[1], 3), rep(wt_samples[2], 3), rep(wt_samples[3], 2), 
                 rep(mut_samples[1], 3), rep(mut_samples[2], 3), rep(mut_samples[3], 3), rep(mut_samples[4], 3)),
  condition = c(rep("WT", 8), rep("Mutant", 12)),
  replicate = rep(rep(1:3, each = 3), times = c(2, 2, 1, 3, 3, 3, 3))  # Adjust the number of replicates
)

# Create DESeqDataSet
dds <- DESeqDataSetFromMatrix(countData = cbind(wt_combined, mut_combined),
                              colData = sample_metadata,
                              design = ~ condition + replicate)`# Load DESeq2 library
library(DESeq2)

# Read count data for all samples
wt_samples <- c("SRR8435995", "SRR8435996", "SRR8435997",
                "SRR19159298", "SRR19159299", "SRR19159300",
                "SRR24572364", "SRR24572365")

mut_samples <- c("SRR8435992", "SRR8435993", "SRR8435994",
                 "SRR22192978", "SRR22192979", "SRR22192981",
                 "SRR22729521", "SRR22729522", "SRR22729523",
                 "SRR24442519", "SRR24442520", "SRR24442521")

# Read count data for all samples
wt_counts <- lapply(wt_samples, function(sample_id) {
  read.delim(paste0(sample_id, ".csv"), row.names = 1)
})

mut_counts <- lapply(mut_samples, function(sample_id) {
  read.delim(paste0(sample_id, ".csv"), row.names = 1)
})

# Combine replicates for each sample
wt_combined <- Reduce("+", wt_counts)
mut_combined <- Reduce("+", mut_counts)

# Create sample metadata
wt_replicates <- c(3, 3, 2)  # Number of replicates for each wild type sample
mut_replicates <- rep(3, 4)   # Number of replicates for each mutant sample

# Create sample metadata
sample_metadata <- data.frame(
  sampleName = c(rep(wt_samples[1], 3), rep(wt_samples[2], 3), rep(wt_samples[3], 2), 
                 rep(mut_samples[1], 3), rep(mut_samples[2], 3), rep(mut_samples[3], 3), rep(mut_samples[4], 3)),
  condition = c(rep("WT", 8), rep("Mutant", 12)),
  replicate = rep(rep(1:3, each = 3), times = c(2, 2, 1, 3, 3, 3, 3))  # Adjust the number of replicates
)

# Create DESeqDataSet
dds <- DESeqDataSetFromMatrix(countData = cbind(wt_combined, mut_combined),
                              colData = sample_metadata,
                              design = ~ condition + replicate)
RNA-seq R DESeq2 • 283 views
ADD COMMENT
0
Entering edit mode
7 months ago
dthorbur ★ 2.5k

I'm a little confused why you posted your entire script but then didn't refer to it in your question.

Having a different number of samples in your treatments is fine, and is fairly common either due to experimental design or because samples fail. Here is a similar post on this forum from a few years ago with some advice that is relevant.

I would inspect the samples using a PCA post mapping both for abundance and possibly for presence/absence. If you replicates don't cluster together then you may have a problem, this is true for 2 or 3 replicates.

ADD COMMENT

Login before adding your answer.

Traffic: 1630 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6