Normally we would also filter lowly expressed genes. For this data,
all transcripts already have at least 50 reads for all samples of at
least one of the tissues types.
How can I do it in R? How can I define "at lest one of the tissues types"?
ADD COMMENT
• link
updated 4.1 years ago by
CL
▴
40
•
written 6.8 years ago by
noeD
▴
130
0
Entering edit mode
Hello, just to be sure.
When you say: "This keeps those genes which have minimum cpm of 1 in at least 2 samples", this would be 2 samples of the same condition? Therefore, in at least 2 samples of the carcinoma group and 2 normal tissue samples. Am I right?
In case of 3 different groups, with the smallest group having 15 samples, I was using the below filtering
keep <- rowSums(cpm(y)>0.2) >= 5
Assuming a 0.2 CPM threshold in case of lib. size 28M would mean a cut-off of ~5.6 reads.
This keeps those genes which have minimum cpm of 1 in at least 2 samples.You may also write more specific criteria following the same rule if you have different sample/tissue types.
Hi, I found the edgeR::filterByExpr() function really useful. The thing is that when you have different groups with quite different number of samples in each, I am not sure if that threshold of at least 2 samples is done regardless of group, so it will not be the same to say 2 samples in a group of 15 samples, or 2 samples in the total number of samples.
I expect edgeR::filterByExpr() is taking into account number of samples per group to do so.
Thanks!
Hello, just to be sure. When you say: "This keeps those genes which have minimum cpm of 1 in at least 2 samples", this would be 2 samples of the same condition? Therefore, in at least 2 samples of the carcinoma group and 2 normal tissue samples. Am I right?
In case of 3 different groups, with the smallest group having 15 samples, I was using the below filtering keep <- rowSums(cpm(y)>0.2) >= 5
Assuming a 0.2 CPM threshold in case of lib. size 28M would mean a cut-off of ~5.6 reads.
Use
edgeR::filterByExpr()
, it will apply appropriate cutoffs on a per-group basis.Really useful, thank you!