edgeR RNA-seq Results Across Conditions
1
0
Entering edit mode
7.3 years ago
asyndeton17 ▴ 40

Hi,

I recently performed RNA-seq differential expression analysis. I first did a comparison between Condition 2 vs. Control. Then I did Condition 3 vs. Control. However, the raw counts do not seem to be substantiating the output data from edgeR. For instance, the Krt17 gene supposedly has a log fold change of 7 in the comparison between Condition 2 vs. Control, and the Krt17 gene does not show up on the list of differentially expressed genes (at an FDR of 0.05 or lower) for Condition 3 vs. Control. Yet, when I go back to the raw counts for the Krt17 gene, this is what I have: Control replicates(0, 0, 0), Condition 2 (25, 27, 25), and Condition 3 (16, 14, 11). Clearly, it seems to be "overexpressed" in both, but it doesn't show up in the other list. How do I fix this or account for such an observation?

For those wondering, I ran my analysis following most of what the Griffith Lab tutorial suggests.

Thanks

RNA-Seq edgeR • 1.9k views
ADD COMMENT
0
Entering edit mode

Try normalizing the data to a linear scale first then fit a model with limma.

#filter lowly expressed
y <- DGEList(counts = raw_counts)
y <- y[!rowSums(y$counts == 0) == ncol(raw_counts),] 
A <- aveLogCPM(y)
y2 <- y[A>1,]

#library size factor, quantile, and batch normalizations
design <- model.matrix(~0+group)
cont.matrix <- makeContrasts(group1 - group2, 
    group2 - group3, group1 - group3, levels = design)
y3 <- calcNormFactors(y2, method = "TMM")
dge <- voomWithQualityWeights(y3, design=design, 
    normalization="quantile", plot=FALSE)
rbe <- removeBatchEffect(dge, batch)

#fit to linear model
fit <- lmFit(dge, design)
fit2 <- contrasts.fit(fit, cont.matrix)
fit3 <- eBayes(fit2, trend = TRUE, robust = TRUE)

#examine results
topTable(fit3, coef=1, num=Inf, sort.by = "P", adjust = "BH")
topTable(fit3, coef=2, num=Inf, sort.by = "P", adjust = "BH")
topTable(fit3, coef=3, num=Inf, sort.by = "P", adjust = "BH")
ADD REPLY
0
Entering edit mode

Would it work if I filtered lowly expressed counts before creating the edgeR object? If so, what sort of cutoff should I make?

ADD REPLY
0
Entering edit mode

It's already taken care of in the code. Specifically, the number of instances a gene / transcript feature has a sample count greater than 0 and the average CPM across all samples is atleast 1. Usually, if you used the same seq machine to run all of your samples, you will use the sequencing lane as factor for batch, and this can scale out to include multiple sequencing machines from different institutions, etc.

ADD REPLY
0
Entering edit mode

Can you post your code here?

ADD REPLY
0
Entering edit mode
ADD REPLY
0
Entering edit mode
7.3 years ago
shoujun.gu ▴ 380

is its FDR too big? or you may check your filter criteria to see whether this gene is filtered by not enough reads ?

ADD COMMENT

Login before adding your answer.

Traffic: 1797 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6