Hello!! AIM: To obtain differentially expressed genes and then identifying upregulated and downregulated genes from my analysis, here are the steps I run on R(DESeq2):
1. counts <- read.delim("counts.csv", header = TRUE, row.names =1, sep
= ",", check.names=FALSE)
2. dim(counts)
3. colnames(counts)
4. colData <- read.delim("colData.csv", header= TRUE,sep = ",")
5. dim(colData)
6. test <- c("OsM5_1G006560", "OsM5_1G007650", "OsM5_2G010660",
"OsM5_2G013030", "OsM5_2G015620", "OsM5_2G015640", "OsM5_2G025980",
"OsM5_2G030920", "OsM5_3G012590", "OsM5_3G013640")
7. test_counts<- counts[test, ]
8. test_dds <- DESeqDataSetFromMatrix(countData = test_counts, colData
= colData, design = ~ treatment)
9. test_dds <- DESeq(test_dds)
10. test_res <- results(test_dds)
11. test_vsd <- varianceStabilizingTransformation(test_dds, blind=T)
12. test_res_ordered <- test_res[order(test_res$padj),]
13. test_de_genes <-
rownames(test_res_ordered)[which(test_res_ordered$padj < 0.01 &
abs(test_res_ordered$log2FoldChange) > 1)]
14. test_de_expression <- assay(test_vsd)[test_de_genes, ]
15. heatmap.2(test_de_expression,col = redgreen(75),scale =
"row",keysize = 0.6,symkey = TRUE,density.info = "none",trace =
"none",cexCol = 1.3,cexRow = 1.3,labRow =
rownames(test_de_expression),margins = c(9, 13))
16. upregulated_genes <-
rownames(test_res_ordered)[test_res_ordered$log2FoldChange > 1
17. downregulated_genes <-
rownames(test_res_ordered)[test_res_ordered$log2FoldChange < 1]
For getting Differentially Expressed Genes:
The code test_de_genes <- rownames(test_res_ordered)[which(test_res_ordered$padj < 0.01 & abs(test_res_ordered$log2FoldChange) > 1)]
is used to obtain differentially expressed genes with an adjusted p-value (padj) less than 0.01 and an absolute log2 fold change greater than 1. This give me a list of genes that are significantly differentially expressed between the conditions being compared.
After running this code, a list of differentially expressed genes are:
test_de_genes
OsM5_1G006560
OsM5_2G010660
OsM5_2G015640
OsM5_2G030920
Identifying Upregulated and Downregulated Genes: To get upregulated and downregulated genes, I use the following code:
upregulated_genes <- rownames(test_res_ordered)[test_res_ordered$log2FoldChange > 1
After running this code, separate lists for upregulated genes:
upregulated_genes
OsM5_2G010660
OsM5_2G015640
OsM5_2G025980
downregulated_genes <- rownames(test_res_ordered)[test_res_ordered$log2FoldChange < 1]
After running this code, separate lists for downregulated genes: downregulated_genes OsM5_1G006560 OsM5_1G007650 OsM5_2G013030 OsM5_2G015620
If look closely, the list of differentially expressed genes obtained from test_de_genes are not identical and equal to the union of upregulated and downregulated genes(why these lists differmight be because not all differentially expressed genes will be upregulated or downregulated, some may have a significant change in expression but still fall on one side of the log2 fold change threshold. Additionally, some genes may have significant changes in expression but might not meet the strict criteria for being considered either upregulated or downregulated. But, I also heard that it is entirely normal to get differ lists in differential genes expressed list and its directions(UP & DOWN REGULATED) lists.
So, I modify the code a little as an attempt of correction:
As I was using the log fold change value in the criteria to define differentially expressed genes in the test_de_genes variable.
To get Differentially Expressed Genes:
The code test_de_genes <- rownames(test_res_ordered)[which(test_res_ordered$padj < 0.01 & abs(test_res_ordered$log2FoldChange) > 1)] is indeed used to obtain differentially expressed genes with an adjusted p-value (padj) less than 0.01 and an absolute log2 fold change greater than 1. So, test_de_genes already contains the list of differentially expressed genes based on both statistical significance (adjusted p-value) and magnitude of change (log2 fold change).
Identifying Upregulated and Downregulated Genes: To get upregulated and downregulated genes separately, I updated the following code:
upregulated_genes <- rownames(test_res_ordered)[test_res_ordered$log2FoldChange > 1 & test_res_ordered$padj < 0.01]
downregulated_genes <- rownames(test_res_ordered)[test_res_ordered$log2FoldChange < 1 & test_res_ordered$padj < 0.01]
My question or request is to confirm whether the code I run are in sequence and valid. Do I need any correction. Is it issue with my code or analysis I run ?
Advance thanks to all !!!
Hi Kevin.
Thank you for this. I think I made a mistage to get DOWNREGULATED GENE by using just 1 instead of -1. Your suggestion seems correct!!
If that is the case, then in code 13, to get differentially expressed genes:
test_de_genes <- rownames(test_res_ordered)[which(test_res_ordered$padj < 0.01 & abs(test_res_ordered$log2FoldChange) > 1)]
, here, code is indeed designed to select genes that are upregulated (log2 fold change greater than 1) and have a significant p-value (adjusted p-value less than 0.01). If we only consider the conditionabs(test_res_ordered$log2FoldChange) > 1
, it will only select genes that have an absolute log2 fold change greater than 1, and this inherently implies upregulated genes.So, to get just DEGs first without specifying a direction(i.e., forget about their expression direction whether it is up o down-regulated), I think we need to correct the code as:
Follwing this code, I can separately apply up and down regulated code you suggested to caputred the direction of DEGs I get in code 13.
Please provide your insight on this matter, and I kindly request a response from you, Thank you.
Hola de nuevo, sí --yes-- this will select both up- and down-regulated genes that are statistically significant: