Question

To identify Differentially Expressed Genes and Unraveling Upregulated and Downregulated Genes using DESeq2 in R

0

Entering edit mode

15 months ago

Nelo ▴ 20

Hello!! AIM: To obtain differentially expressed genes and then identifying upregulated and downregulated genes from my analysis, here are the steps I run on R(DESeq2):

 1. counts <- read.delim("counts.csv", header = TRUE, row.names =1, sep
        = ",", check.names=FALSE)
     2. dim(counts)
     3. colnames(counts)
     4. colData <- read.delim("colData.csv", header= TRUE,sep = ",")
     5. dim(colData)
     6. test <- c("OsM5_1G006560", "OsM5_1G007650", "OsM5_2G010660",
        "OsM5_2G013030", "OsM5_2G015620", "OsM5_2G015640", "OsM5_2G025980",
        "OsM5_2G030920", "OsM5_3G012590", "OsM5_3G013640")
     7. test_counts<- counts[test, ]
     8. test_dds <- DESeqDataSetFromMatrix(countData = test_counts, colData
        = colData, design = ~ treatment)
     9. test_dds <- DESeq(test_dds)
     10. test_res <- results(test_dds)
     11. test_vsd <- varianceStabilizingTransformation(test_dds, blind=T)
     12. test_res_ordered <- test_res[order(test_res$padj),]
     13. test_de_genes <-
         rownames(test_res_ordered)[which(test_res_ordered$padj < 0.01 &
         abs(test_res_ordered$log2FoldChange) > 1)]
     14. test_de_expression <- assay(test_vsd)[test_de_genes, ]
     15. heatmap.2(test_de_expression,col = redgreen(75),scale =
         "row",keysize = 0.6,symkey = TRUE,density.info = "none",trace =
         "none",cexCol = 1.3,cexRow = 1.3,labRow =
         rownames(test_de_expression),margins = c(9, 13))
     16. upregulated_genes <-
         rownames(test_res_ordered)[test_res_ordered$log2FoldChange > 1
     17. downregulated_genes <-
         rownames(test_res_ordered)[test_res_ordered$log2FoldChange < 1]

For getting Differentially Expressed Genes: The code test_de_genes <- rownames(test_res_ordered)[which(test_res_ordered$padj < 0.01 & abs(test_res_ordered$log2FoldChange) > 1)] is used to obtain differentially expressed genes with an adjusted p-value (padj) less than 0.01 and an absolute log2 fold change greater than 1. This give me a list of genes that are significantly differentially expressed between the conditions being compared. After running this code, a list of differentially expressed genes are:

test_de_genes
OsM5_1G006560
OsM5_2G010660
OsM5_2G015640
OsM5_2G030920

Identifying Upregulated and Downregulated Genes: To get upregulated and downregulated genes, I use the following code:

upregulated_genes <- rownames(test_res_ordered)[test_res_ordered$log2FoldChange > 1

After running this code, separate lists for upregulated genes:

upregulated_genes
OsM5_2G010660
OsM5_2G015640
OsM5_2G025980

downregulated_genes <- rownames(test_res_ordered)[test_res_ordered$log2FoldChange < 1]

After running this code, separate lists for downregulated genes: downregulated_genes OsM5_1G006560 OsM5_1G007650 OsM5_2G013030 OsM5_2G015620

If look closely, the list of differentially expressed genes obtained from test_de_genes are not identical and equal to the union of upregulated and downregulated genes(why these lists differmight be because not all differentially expressed genes will be upregulated or downregulated, some may have a significant change in expression but still fall on one side of the log2 fold change threshold. Additionally, some genes may have significant changes in expression but might not meet the strict criteria for being considered either upregulated or downregulated. But, I also heard that it is entirely normal to get differ lists in differential genes expressed list and its directions(UP & DOWN REGULATED) lists.

So, I modify the code a little as an attempt of correction:

As I was using the log fold change value in the criteria to define differentially expressed genes in the test_de_genes variable.

To get Differentially Expressed Genes:

The code test_de_genes <- rownames(test_res_ordered)[which(test_res_ordered$padj < 0.01 & abs(test_res_ordered$log2FoldChange) > 1)] is indeed used to obtain differentially expressed genes with an adjusted p-value (padj) less than 0.01 and an absolute log2 fold change greater than 1. So, test_de_genes already contains the list of differentially expressed genes based on both statistical significance (adjusted p-value) and magnitude of change (log2 fold change).

Identifying Upregulated and Downregulated Genes: To get upregulated and downregulated genes separately, I updated the following code:

upregulated_genes <- rownames(test_res_ordered)[test_res_ordered$log2FoldChange > 1 & test_res_ordered$padj < 0.01]

downregulated_genes <- rownames(test_res_ordered)[test_res_ordered$log2FoldChange < 1 & test_res_ordered$padj < 0.01]

My question or request is to confirm whether the code I run are in sequence and valid. Do I need any correction. Is it issue with my code or analysis I run ?

Advance thanks to all !!!

R DEGs DESeq2 • 1.9k views

ADD COMMENT • link updated 15 months ago by Kevin Blighe 88k • written 15 months ago by Nelo ▴ 20

score 2 · Accepted Answer · 2023-08-04

2

Entering edit mode

15 months ago

Kevin Blighe 88k

Hola / Olá,

I will make some affirmatory statements, to ensure that we are aligned.

To define statistical significance, you are using adjusted p-value < 0.01 & absolute log [base 2] fold-change > 1.

Your code to select up-regulated genes is:

rownames(test_res_ordered)[test_res_ordered$log2FoldChange > 1 & test_res_ordered$padj < 0.01]

This --yes-- selects statistically significant up-regulated genes with log [base 2] fold-changes > 1.

Your code to select down-regulated genes is

rownames(test_res_ordered)[test_res_ordered$log2FoldChange < 1 & test_res_ordered$padj < 0.01]

This is incorrect. You need to use -1.

Ciao

Kevin

ADD COMMENT • link 15 months ago by Kevin Blighe 88k

0

Entering edit mode

Hi Kevin.

Thank you for this. I think I made a mistage to get DOWNREGULATED GENE by using just 1 instead of -1. Your suggestion seems correct!!

If that is the case, then in code 13, to get differentially expressed genes:

test_de_genes <- rownames(test_res_ordered)[which(test_res_ordered$padj < 0.01 & abs(test_res_ordered$log2FoldChange) > 1)], here, code is indeed designed to select genes that are upregulated (log2 fold change greater than 1) and have a significant p-value (adjusted p-value less than 0.01). If we only consider the condition abs(test_res_ordered$log2FoldChange) > 1, it will only select genes that have an absolute log2 fold change greater than 1, and this inherently implies upregulated genes.

So, to get just DEGs first without specifying a direction(i.e., forget about their expression direction whether it is up o down-regulated), I think we need to correct the code as:

test_de_genes <- rownames(test_res_ordered)[which(test_res_ordered$padj < 0.01

Follwing this code, I can separately apply up and down regulated code you suggested to caputred the direction of DEGs I get in code 13.

Please provide your insight on this matter, and I kindly request a response from you, Thank you.

ADD REPLY • link 15 months ago by Nelo ▴ 20

0

Entering edit mode

Hola de nuevo, sí --yes-- this will select both up- and down-regulated genes that are statistically significant:

test_de_genes <- rownames(test_res_ordered)[which(test_res_ordered$padj < 0.01)]

ADD REPLY • link 15 months ago by Kevin Blighe 88k