Hello,
I have a question regarding the filtering process in gene analysis. My dataset consists of 8 samples of each 3 treatments (total 24 samples). For each sample, 10000 genes were collected and the corresponding counts number were recorded. Since I was interested in analysis in log2 counts in the future analysis, I am intended to remove genes with zero counts. My intended steps:
Remove all the genes with at least one zero count in 24 samples.
Using
filterByExpr
filter out genes with low counts. This step removes genes of low counts according to CPM.
The reasons are:
- Filter all the genes with zero counts: If zero counts are left, it turns -Inf after log2 transformation. It's really bad for future analysis. log2 are considered as a biological relevant change. It makes no sense to do log2(counts +1).
- I should filter out genes by considering 24 samples together. Since the gene expression counts in different treatments will be compared in the future analysis. If I remove the genes in 8 sample bases, some genes may only appear in treatment 1 and not in treatment 2 which makes it impossible to compare.
I am new to the field. Please help me if I am on the right track. Thank you.
Thanks for your help. I'll stick to the manual.