Entering edit mode
22 days ago
rajdeepboral00
▴
60
I have created the raw counts matrix using STAR and RSEM. Now I want to get the DEGs. Should I directly use this raw counts for DESeq without any filtering or such?
library(DESeq2)
data = read.csv("counts.csv")
rownames(data) <- data[,1]
data <- data[,-1]
# Define metadata for the six groups
coldata <- data.frame(
row.names = colnames(data),
condition = factor(c(rep("CC", 4), rep("CT", 4), rep("CL", 4),
rep("IC", 4), rep("IT", 4), rep("IL", 4)))
)
# Check the factor levels
levels(coldata$condition) # It should show: "CC", "CT", "CL", "IC", "IT", "IL"
# Create DESeq2 dataset
dds <- DESeqDataSetFromMatrix(countData = data, colData = coldata, design = ~ condition)
# Run DESeq2
dds <- DESeq(dds)
# Compare CC vs IC
res_CC_IC <- results(dds, contrast = c("condition", "IC", "CC"))
# Compare CT vs IT
res_CT_IT <- results(dds, contrast = c("condition", "IT", "CT"))
# Compare CL vs IL
res_CL_IL <- results(dds, contrast = c("condition", "IL", "CL"))
# Summary of results for each comparison
summary(res_CC_IC)
summary(res_CT_IT)
summary(res_CL_IL)
# Remove rows with NA in either log2FoldChange or padj
res_CC_IC_clean <- res_CC_IC[!is.na(res_CC_IC$log2FoldChange) & !is.na(res_CC_IC$padj), ]
# Apply log2 fold change cutoff of 2 and padj < 0.05
res_CC_IC_filtered <- res_CC_IC_clean[abs(res_CC_IC_clean$log2FoldChange) > 2 & res_CC_IC_clean$padj < 0.05, ]
# View the filtered results
summary(res_CC_IC_filtered)
plotCounts(dds, gene="ENSMUSG00000057666")
# Get the normalized counts
normalized_counts <- counts(dds, normalized = TRUE)
# Save the normalized counts to a CSV file
write.csv(normalized_counts, "normalized_counts.csv")
write.csv(res_CC_IC_filtered,file="CCvsIC_normal.csv")
Is this a right code to check the DEGs between the CC and IC groups?
your question title is a bit of misnomer: "What is the next step after getting the raw counts matrix for RNASeq data?"
evidently you do know what the next step is :-)
your question is whether you should apply additional filtering to your data.
you could do things like remove rows that have fewer than X number of measurements
These sorts of filter essentially only remove genes that are all-zero. A better filter is in the DESeq2 vignette, or simply usr filterByExpr in edgeR.
Your code looks good. In addition you can also add
lfcShrink
function to shrink log fold changes association with condition: