Question

How can I change the fold change based on count per million value to 4?

0

Entering edit mode

5.7 years ago

Keyouer ▴ 10

Hey guys, I am a freshman to learn R... Recently I am using the EdgeR package to perform DEG analysis, here is the R code I am using... table <- as.data.frame(topTags(result, n = nrow(count))) is.DEG <- as.logical(table$FDR < 0.05) DEG.names <- rownames(table)[is.DEG] png("deg.png",width = 640, height = 480) plotSmear(result, de.tags = DEG.names) dev.off()

updeg <- subset(table, FDR <0.05) updeg <- subset(updeg, logFC > 0) uc <- sum(updeg$FDR < 0.05) u <- paste(project,"up") countlist[u] <- uc updeg <- rownames(updeg) write.table(updeg, "updeg.csv", quote=F, append=T,row.names=FALSE)

This is I summarized according to the guidebook and online information, But the thing is I want to change the fold change to more stringently like 4... I would be very appreciative if someone could help me with that...

RNA-Seq • 2.3k views

ADD COMMENT • link 5.5 years ago by Keyouer ▴ 10

0

Entering edit mode

Is this solved? If not please explain why. Also consider upvoting comments that helped. Please also highlight code with the code option 10101.

ADD REPLY • link 5.5 years ago by ATpoint 89k

0

Entering edit mode

Oh, sorry, I posted new questions...

ADD REPLY • link 5.5 years ago by Keyouer ▴ 10

0

Entering edit mode

...and you did not check any online tutorial on filtering in R the last 9 weeks?

Please do that.

Whenever you have a data.frame you can do:

df[df$logFC > 4 & df$logCPM > 2,]

Just convert your topTags object to a data frame and then start subsetting.

ADD REPLY • link 5.5 years ago by ATpoint 89k

ATpoint · Answer 1 · 2019-12-10

0

Entering edit mode

5.7 years ago

ATpoint 89k

I always do:

## get topTags for all genes:
TT    <- topTags(ResultFromQLFTest, n=Inf, adjust.method="BH", sort.by="none")

## tabular output
TT    <- TT$table

## add gene names:
TT.out <- data.frame(Gene = rownames(TT), TT)

## write as TSV to disk:
write.table(sep="\t", quote = FALSE, row.names = FALSE, col.names = TRUE, file = "output.tsv", x = TT.out)

I strongly suggest you always provide as supplement the full output, not just the significant ones to give the reader the ability to scan for all genes. It might also be of interest to see if there is strong evidence against differential expression, therefore provide full topTags output.

ADD COMMENT • link 5.7 years ago by ATpoint 89k

0

Entering edit mode

Thank you so much for your kindly reply. I will try with your code. But sometimes for beginners, like me, even a small change in the code could make me feel confused... Your explanation seems easy to understand. Appreciate...

ADD REPLY • link 5.7 years ago by Keyouer ▴ 10

0

Entering edit mode

You're welcome. Feel free to ask if you have any trouble on the way.

ADD REPLY • link 5.7 years ago by ATpoint 89k

0

Entering edit mode

Yeah, thank you. The code "ResultFromQLFTest" doesn't go well with my current R code, I suppose I may have to define this in advance? I'll post all my code here to be as your reference...

library(grid)
library(gridExtra)
library(gtable)
library(limma)
library(edgeR)
library(topGO)
library(BiocManager)
library(ALL)
data(ALL)
data(geneList)
library(package = affyLib, character.only = TRUE)
count <- read.table("DEG_all.txt", sep = "\t", header = T, row.names = 1)
count <- as.matrix(count)

countlist <- as.list(NULL)

geneID2GO <- readMappings("jgiGO.txt")
GO2geneID <- inverseList(geneID2GO)
geneID2GO <- inverseList(GO2geneID)
geneNames <- names(geneID2GO)
project <- "A_B"
compare <- c("A","B")
#template
listcsv <- paste(project,"_comp.csv")
counts <- count[,compare]
dir.create(project)
setwd(project)


write.table(counts,file=listcsv,sep=",",col.names=T,row.names=T,quote=F)


group <- factor(c("A", "B"))
d <- DGEList(counts = counts, group = group)
d <- calcNormFactors(d)
#d <- estimateGLMCommonDisp(d, method = "deviance", robust = T, subset = NULL)
bcv <- 0.1
result <- exactTest(d,dispersion=bcv^2)


#DEG
table <- as.data.frame(topTags(result, n = nrow(count)))
is.DEG <- as.logical(table$FDR < 0.05)
DEG.names <- rownames(table)[is.DEG]
png("deg.png",width = 640, height = 480)
plotSmear(result, de.tags = DEG.names)
dev.off()

updeg <- subset(table, FDR <0.05)
updeg <- subset(updeg, logFC > 0)
uc <- sum(updeg$FDR < 0.05)
u <- paste(project,"up")
countlist[u] <- uc
updeg <- rownames(updeg)
write.table(updeg, "updeg.csv", quote=F, append=T,row.names=FALSE)

downdeg <- subset(table, FDR <0.05)
downdeg <- subset(downdeg, logFC < 0)
uc <- sum(downdeg$FDR < 0.05)
u <- paste(project,"down")
countlist[u] <- uc

downdeg <- rownames(downdeg)
write.table(downdeg, "downdeg.csv", quote=F, append=T,row.names=FALSE)

ADD REPLY • link updated 5.7 years ago by ATpoint 89k • written 5.7 years ago by Keyouer ▴ 10

0

Entering edit mode

Ok, yeah ResultFromQLFTest in your case would be just result, ignore how I called it. Lets also ignore that you are not estimating but just entering a dispersion estimate and therefore will get absolutely non-reliable results. Rest of the code looks ok (though I did not check every line). Is it working?

ADD REPLY • link 5.7 years ago by ATpoint 89k

0

Entering edit mode

Yes, it worked. Thank you so much. I entered a dispersion estimate because I only have one RNA-seq sample other than several...

ADD REPLY • link 5.7 years ago by Keyouer ▴ 10

0

Entering edit mode

Be aware that this is not reliable as the true dispersion could be much larger due to either biological differences between true replicates or technical issues during library prep, just saying.

ADD REPLY • link 5.7 years ago by ATpoint 89k

0

Entering edit mode

Yes, you are right. Thank you so much...

ADD REPLY • link 5.7 years ago by Keyouer ▴ 10

0

Entering edit mode

Could you please help one more thing? I want to define DEG as the count per million value >=2, foldchange>4, as I really cannot figure out how to do this...

ADD REPLY • link 5.7 years ago by Keyouer ▴ 10