Hi all,
I'm trying to perform differential gene expression on patch-seq recorded neurons (heterogenous population, most likely) from animals in two conditions (A and B). I have ~3 to 4 neurons per animal and ~50 cells in each condition. I'm using DESeq2 and edgeR (and others...)
1) is it recommanded to make pseudobulks (onre pseudobulk=one animal), given that it is unlikely that 5 neurons recorded in the same animal belong to the same neuronal type, and that libraries are made cell by cell (more or less manually).
2) edgeR and DESeq2 obviously give very different results (fdr 0.05) with very little overlap:
edgeR_GLM 1212
edgeR_QLF 4119
edgeR_exact 0
DESeq2_LRT_poscount 35
DESeq2_LRT_standard 13
DESeq2_Wald_poscount 399
DESeq2_Wald_standard 119
I can filter my gene list in order to keep only genes reaching a certain expression threshold (say at least 100 reads in all dataset, and 10 cells positive). However is it "legal" to filter gene list based on a recomputed log2FC threshold (i.e, discard all genes for which abs(log2FC)<log2(1.5)) before runnings edgeR/DESeq
3) At the same time, I can cross validate a logistic regression to differentiate conditions A and B, which gives me correct average accuracies (0.73). Selecting the top (200) contributing genes, either directly, or using Recursive Feature Elimination may give me another set of ~200 genes that "look" differentially regulated. Importantly, there is little overlab with DESeq2 and/or edgeR, but the isolated genes seems to make sense from a physiological point of view (which is not the case with DESeq and edgeR. Should I be more/less confident with these genes and why?
Thanks in advance!