I am confused about normalization and statistics behind DE programs, I am using edgeR to analize two condittions.
Example for a gene ( raw-counts) four replicates by condition. Control (C) and treatment (T) of a gene:
gene= FBgn0034710
Controles = 820-1618-1728-1007
Tratamientos= 7195-1252-1312-1291
Result of edgeR
logFC =1.10 logCPM = 6.5 LR = 9.77 PValue = 0.0017 FDR= 0.02
Why FBgn0034710 gene is statistically significant if one replicate has a lot of raw counts (7915) in comparation with the others. I know that library size could be a factor but this is similar in the other replicates
Try taking out such outliers within a group and rerun the statistical test. I do not think edgeR has any mechanism to prune such data. One should filter out such discrepancies at expression level within group and across groups and then feed the data to edgeR.