What is the next step after getting the raw counts matrix for RNA-seq data?
0
0
Entering edit mode
22 days ago

I have created the raw counts matrix using STAR and RSEM. Now I want to get the DEGs. Should I directly use this raw counts for DESeq without any filtering or such?

library(DESeq2)
data = read.csv("counts.csv")
rownames(data) <- data[,1]
data <- data[,-1]

# Define metadata for the six groups
coldata <- data.frame(
  row.names = colnames(data),
  condition = factor(c(rep("CC", 4), rep("CT", 4), rep("CL", 4), 
                       rep("IC", 4), rep("IT", 4), rep("IL", 4)))
)

# Check the factor levels
levels(coldata$condition)  # It should show: "CC", "CT", "CL", "IC", "IT", "IL"

# Create DESeq2 dataset
dds <- DESeqDataSetFromMatrix(countData = data, colData = coldata, design = ~ condition)

# Run DESeq2
dds <- DESeq(dds)
# Compare CC vs IC
res_CC_IC <- results(dds, contrast = c("condition", "IC", "CC"))

# Compare CT vs IT
res_CT_IT <- results(dds, contrast = c("condition", "IT", "CT"))

# Compare CL vs IL
res_CL_IL <- results(dds, contrast = c("condition", "IL", "CL"))
# Summary of results for each comparison
summary(res_CC_IC)
summary(res_CT_IT)
summary(res_CL_IL)

# Remove rows with NA in either log2FoldChange or padj
res_CC_IC_clean <- res_CC_IC[!is.na(res_CC_IC$log2FoldChange) & !is.na(res_CC_IC$padj), ]

# Apply log2 fold change cutoff of 2 and padj < 0.05
res_CC_IC_filtered <- res_CC_IC_clean[abs(res_CC_IC_clean$log2FoldChange) > 2 & res_CC_IC_clean$padj < 0.05, ]

# View the filtered results
summary(res_CC_IC_filtered)
plotCounts(dds, gene="ENSMUSG00000057666")

# Get the normalized counts
normalized_counts <- counts(dds, normalized = TRUE)

# Save the normalized counts to a CSV file
write.csv(normalized_counts, "normalized_counts.csv")
write.csv(res_CC_IC_filtered,file="CCvsIC_normal.csv")

Is this a right code to check the DEGs between the CC and IC groups?

DESEq2 transriptomics RNA-seq • 303 views
ADD COMMENT
1
Entering edit mode

your question title is a bit of misnomer: "What is the next step after getting the raw counts matrix for RNASeq data?"

evidently you do know what the next step is :-)

your question is whether you should apply additional filtering to your data.

you could do things like remove rows that have fewer than X number of measurements

keep <- rowSums(counts(dds)) >= 10
dds <- dds[keep,]
ADD REPLY
0
Entering edit mode

These sorts of filter essentially only remove genes that are all-zero. A better filter is in the DESeq2 vignette, or simply usr filterByExpr in edgeR.

ADD REPLY
0
Entering edit mode

Your code looks good. In addition you can also add lfcShrink function to shrink log fold changes association with condition:

res = lfcShrink(dds, coef="condition_trt_vs_untrt", type="apeglm")
ADD REPLY

Login before adding your answer.

Traffic: 2340 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6