Filtering lowly expressed genes - edgeR
2
1
Entering edit mode
6.9 years ago
noeD ▴ 130

Hello!

I have read on EdgeR guideline, section 4.1.4 "RNA-Seq of oral carcinomas vs matched normal tissue" (http://www.bioconductor.org/packages/release/bioc/vignettes/edgeR/inst/doc/edgeRUsersGuide.pdf) this suggestion for filtering lowly expressed genes:

Normally we would also filter lowly expressed genes. For this data, all transcripts already have at least 50 reads for all samples of at least one of the tissues types.

How can I do it in R? How can I define "at lest one of the tissues types"?

Thank you in advance

Best

filtering RNA-Seq R edgeR • 9.2k views
ADD COMMENT
0
Entering edit mode

Hello, just to be sure. When you say: "This keeps those genes which have minimum cpm of 1 in at least 2 samples", this would be 2 samples of the same condition? Therefore, in at least 2 samples of the carcinoma group and 2 normal tissue samples. Am I right?

In case of 3 different groups, with the smallest group having 15 samples, I was using the below filtering keep <- rowSums(cpm(y)>0.2) >= 5

Assuming a 0.2 CPM threshold in case of lib. size 28M would mean a cut-off of ~5.6 reads.

ADD REPLY
3
Entering edit mode

Use edgeR::filterByExpr(), it will apply appropriate cutoffs on a per-group basis.

ADD REPLY
0
Entering edit mode

Really useful, thank you!

ADD REPLY
3
Entering edit mode
6.9 years ago

See the "Filtering" (section 2.6) in the manual.

> keep <- rowSums(cpm(y)>1) >= 2
> y <- y[keep, , keep.lib.sizes=FALSE]

This keeps those genes which have minimum cpm of 1 in at least 2 samples.You may also write more specific criteria following the same rule if you have different sample/tissue types.

ADD COMMENT
0
Entering edit mode
4.1 years ago
CL ▴ 40

Hi, I found the edgeR::filterByExpr() function really useful. The thing is that when you have different groups with quite different number of samples in each, I am not sure if that threshold of at least 2 samples is done regardless of group, so it will not be the same to say 2 samples in a group of 15 samples, or 2 samples in the total number of samples. I expect edgeR::filterByExpr() is taking into account number of samples per group to do so. Thanks!

ADD COMMENT
0
Entering edit mode

Why exactly do you post an answer with what I told you in the comment above?

ADD REPLY
0
Entering edit mode

I posted a new message instead of replying to your comment. Thanks again

ADD REPLY

Login before adding your answer.

Traffic: 2406 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6