Hi everyone! I have never worked in bioinformatics, but i've been trying to analyse some snRNA-seq data using edgeR. We have two conditions with different number of replicates: Control (n=3) and Disease (n=7).
We wanted to filter the results by 2 cpm, so we wrote:
dge_miRNA <- DGEList(counts=as.matrix(miRNA_counts), group= group)
dge_miRNA list has 453 miRNAs.
keep <- filterByExpr(dge_miRNA, min.count=2, group = group) dge_miRNA_2cpm<-dge_miRNA<- dge_miRNA[keep,,keep.lib.sizes=FALSE]
dge_miRNA_2cpm has 172 miRNAs.
My question is what does the "filterByExpr" actually does? Ive tried to look everywhere for it, but i still dont quite get it.
For example if I get these miRNAs in the final cpm table:
miRNAs/Cntr1/Cntr2/Cntr3/Dis1/Dis2/Dis3/Dis4/Dis5/Dis6/Dis7
mmu-miR-338-3p->>0/1/1/0/6/1/2/4/1/3
mmu-let-7f-1-3p->>0/1/0/2/0/2/7/2/3/0
Why does the mmu-miR-338-3p pass the filter but the mmu-let-7f-1-3p does not? I thought the FilterByExpr, would keep the miRNAs that have at least 2 cpm in at least 3 samples of either group. I thought it was 3 samples because it is the min. number of samples that there is my smallest group...
Im very confused as well as to what does the "min.prop argument" do?
Can anyone help me? :)
If you type
edgeR::filterByExpr.default
you get the source code for the function, it is a comparably simple function that is easy to go through with some example data to follow the process. I would recommend though that you keep it at default since it is a really common prefiltering step that many people use and that is suggested by the developers who have almost > 2 decades of experience in this field. I would always in any analysis recommend the defaults unless you have expert knowledge or a good reason to change it.