Entering edit mode
5 months ago
Rosie
•
0
Hello, I really need help....
I am currently working on doing differential expression analysis of bulk rna sequencing data. However I end of getting very different F, p and p-adj as compared to the report produced by R-shiny. the logFC is quite similar, its like I am getting 1.400 logFC using a R script and R shiny gets 1.3947 logFC on similar one gene.
I am really confused as to why is this happening.
My script I used in R is:
#Read the subread_counts file
x <- read.csv(file = 'filename.csv', header = TRUE, row.names = 1)
#Setting group identifier
group <- factor(c("Co","Co","Co","T","T","T"))
#Creating a DGElist object
dge <- DGEList(counts = x, group = group)
# Filtering lowly expressed genes
keep <- filterByExpr(dge)
dge <- dge[keep, , keep.lib.sizes = FALSE]
# Normalize the data
dge_norm <- calcNormFactors(dge)
dim(dge_norm)
# Estimate dispersion
Disp_dge <- estimateDisp(dge_norm)
# Fit the model
fit <- glmQLFit(Disp_dge)
# Perform the differential expression analysis using quasi likelihood ratio
result <- glmQLFTest(fit)
# Extract the top differentially expressed genes
top_genes <- topTags(result, n = nrow(result$table))
# Save the results to a CSV file
output_file <- "<pathtodirectory/filename>"
write.csv(top_genes$table, file = output_file)
Are you sure that you set the parameters that Rshiny app is using in its inbuilt function and using the same version of tools that R Shiny have used. There will be changes with regards to parameters set and version of tool that is generally known. Also random seed set influences the result a lot. Based on the random seed set the results vary. If you change the random seed then results inverse.
I am very new to all this stuff. Is there a way I can check what parameters is Shiny app using?? Thank you
Please contact the person who developed the app
May you please show the output of:
Hi, It is not necessarily surprising to have small differences between several DGE methods. But if you applied, with DESeq2 and edgeR for example, a protocol with:
Your p-values should be positively correlated, not necessarily similar but with the same p-values ranking, especially for the top genes example here
If you don't use the same statistical test (Wald test vs LRT) the ranking should be preserved for the top of genes, but no linear correlation. It is not theoretical argument, I have test it in practice