The total expressed genes in RNA-Seq data
1
0
Entering edit mode
7 months ago
Pegasus ▴ 120

Hey everyone,

When it comes to RNA-Seq data analysis using edgeR, which filter is commonly used to determine the "total expressed genes."

While I've employed the criterion of logCPM > 1 as one of the filters to identify differentially expressed genes (DEGs), I'm uncertain whether I should apply the same filter to calculate the total expressed genes.

total <- total[total$logCPM > 1, ]

Alternatively, some discussions suggest considering TPM (Transcripts Per Million) for this purpose.

normed <- normed[rowSums(normed > 0) > 1, ] 

Thanks for any insights!

RNA-SEQ • 330 views
ADD COMMENT
1
Entering edit mode
7 months ago
ATpoint 86k

There is no robust definition of "expressed" genes, this has been asked many times before. edgeR doesn't care about "expressed", it cares (by filterByExpr) about sufficient counts for a differential analysis. That is often misinterpreted. See edgeR user guide for the recommended filter (filterByExpr). Choice of expression value does not change the fact that definition of "expressed" is arbitrary without a gold standard to benchmark against.

People sometimes rank genes based on FPKM and then take the inflexion point of the curve, or define simple cutoffs like FPKM > 1, but after all, the cell does not care about inflexion or expression units. These approaches are naive and not robust. Ask yourself if you really need "expressed" genes for your analysis, rather than just those with sufficient counts as edgeR defines them.

ADD COMMENT

Login before adding your answer.

Traffic: 1909 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6