Question

What is the next step after getting the raw counts matrix for RNA-seq data?

0

Entering edit mode

5 months ago

rajdeepboral00 ▴ 70

I have created the raw counts matrix using STAR and RSEM. Now I want to get the DEGs. Should I directly use this raw counts for DESeq without any filtering or such?

library(DESeq2)
data = read.csv("counts.csv")
rownames(data) <- data[,1]
data <- data[,-1]

# Define metadata for the six groups
coldata <- data.frame(
  row.names = colnames(data),
  condition = factor(c(rep("CC", 4), rep("CT", 4), rep("CL", 4), 
                       rep("IC", 4), rep("IT", 4), rep("IL", 4)))
)

# Check the factor levels
levels(coldata$condition)  # It should show: "CC", "CT", "CL", "IC", "IT", "IL"

# Create DESeq2 dataset
dds <- DESeqDataSetFromMatrix(countData = data, colData = coldata, design = ~ condition)

# Run DESeq2
dds <- DESeq(dds)
# Compare CC vs IC
res_CC_IC <- results(dds, contrast = c("condition", "IC", "CC"))

# Compare CT vs IT
res_CT_IT <- results(dds, contrast = c("condition", "IT", "CT"))

# Compare CL vs IL
res_CL_IL <- results(dds, contrast = c("condition", "IL", "CL"))
# Summary of results for each comparison
summary(res_CC_IC)
summary(res_CT_IT)
summary(res_CL_IL)

# Remove rows with NA in either log2FoldChange or padj
res_CC_IC_clean <- res_CC_IC[!is.na(res_CC_IC$log2FoldChange) & !is.na(res_CC_IC$padj), ]

# Apply log2 fold change cutoff of 2 and padj < 0.05
res_CC_IC_filtered <- res_CC_IC_clean[abs(res_CC_IC_clean$log2FoldChange) > 2 & res_CC_IC_clean$padj < 0.05, ]

# View the filtered results
summary(res_CC_IC_filtered)
plotCounts(dds, gene="ENSMUSG00000057666")

# Get the normalized counts
normalized_counts <- counts(dds, normalized = TRUE)

# Save the normalized counts to a CSV file
write.csv(normalized_counts, "normalized_counts.csv")
write.csv(res_CC_IC_filtered,file="CCvsIC_normal.csv")

Is this a right code to check the DEGs between the CC and IC groups?

DESEq2 transriptomics RNA-seq • 628 views

ADD COMMENT • link updated 5 months ago by Istvan Albert 102k • written 5 months ago by rajdeepboral00 ▴ 70

1

Entering edit mode

your question title is a bit of misnomer: "What is the next step after getting the raw counts matrix for RNASeq data?"

evidently you do know what the next step is :-)

your question is whether you should apply additional filtering to your data.

you could do things like remove rows that have fewer than X number of measurements

keep <- rowSums(counts(dds)) >= 10
dds <- dds[keep,]

ADD REPLY • link 5 months ago by Istvan Albert 102k

0

Entering edit mode

These sorts of filter essentially only remove genes that are all-zero. A better filter is in the DESeq2 vignette, or simply usr filterByExpr in edgeR.

ADD REPLY • link 5 months ago by ATpoint 88k

0

Entering edit mode

Your code looks good. In addition you can also add lfcShrink function to shrink log fold changes association with condition:

res = lfcShrink(dds, coef="condition_trt_vs_untrt", type="apeglm")

ADD REPLY • link 5 months ago by DareDevil ★ 4.4k