and found that in Fig.2b the authors use unadjusted DESeq2 p-values to determine the significance levels of differential expression between AD and control groups without clear justification of using the unadjusted p-values. I'm wondering is it generally accepted to use unadjusted p-values in this kind of analysis and wouldn't there be more false positive in this case? Thank you.
I would say no, it's not generally accepted, especially if it represents a primary piece of data. One should adjust p-values if possible. On the other hand, p-values and their distributions are based on assumptions about how the data should be distributed, and some experiments have effects that may violate these assumptions or be full of unexplained noise, and upon adjusting p-values - no genes survive a cutoff. So, as you say, one can expect lots of false positives, and confidence for any one DE gene might be very low. But it doesn't, by definition, mean an experiment is useless, or contains no information. If meaningful groups of genes have been pushed in one direction or the other, one can still use the experiment as a source of hypotheses to test. So as long as there are good and robust follow up experiments, i.e. a method to deal with a high false positive rate for candidates, using a low confidence method to screen for DE genes might be acceptable, especially in difficult systems. It can also go the other way, some experiments with no genes after p-value adjustment should simply be dropped, or accepted at face value as the result of the experiment. Unfortunately, sometimes people can waste time clinging to bad data, but there's usually a way to discriminate between these two situations.
In any multiple testing situation, you expect (n-tp)*a false positives, where n is the number of comparisons, and a is the alpha level of the test and tp is the number of true positives. Thus if you were to do 20,000 tests one expects roughtly 1000 false postive results with an unadjusted p-value threshold of 0.05 in the situation where tp << n. However, in figure 2b of the referenced paper, the authors have not done 20,000 tests, they have done 25 tests (they only tested candidate genes, or so they claim). Here, even if none of the genes were really DE (i.e. all null hypotheses were true) we would only expect 1.25 false positives per tissue pair, where as they observe 7 rejections of the null hypothesis, implying an expected FDR of 1.25/7=18%. Which is probably reasonable. They do however appear to do these comparisons in multiple tissues (even if they don't say so), bringing into play complications of whether you should count comparisons across all tissues, which would imply a signficantly higher false discovery rate.
Now you might quite reasonably ask, why go to the bother of doing RNAseq if you are then only going to test 25 genes, you'd be better doing qPCR. The answer is they probably did test 20,000 genes, and when that didn't work, restricted themselves to the 25. Which is poor statistical practice. But not as poor as drawing conclusions about indevidual genes (groups of genes are different) from unadjusted p-values.
Without adjusting your p-values, your false positive rate will increase at a rate of
1 - (1 - α)^n
where n is the number of comparisons.If you run a test with an alpha of 0.05 for 20,000 genes, your type I error rate would essentially be 1 instead of your stated alpha of 0.05.