There is good reason to suspect there is a problematic assumption, supposition, error, etc. in your code. While it is very possible that it reflects a valid practice, I would not publish the data until I could explain why this was occurring for myself.
Claim: It is more likely that such a sharp demarcation is the result of a questionable data quality control or association testing procedure than biological reality.
Justification: Your data are telling you that p = 1 is reserved for only genes having -3 < LFC < 3. However, p-value is a statistic that combines several characteristics (mean and variance of each obs, how many obs there are in each state, etc.). While some of these same elements are used to estimate LFC, others are not. As such, such a sharp demarcation tends not to be observed.
If it were me I would look at these places and scan for assignments that could produce a pattern like this. I'd check ...
- The segment of code that handles correction for multiple testing
- The segment of code after association testing, but before plotting (for example once you have the DE results, but before you have filtered results into genes you want to plot, and genes you don't want to plot).
- The segment of code before association testing, that creates rules for which genes to include, and which to remove.
Once I understood why the pattern is observed, I might or might not be comfortable publishing the figure. At minimum, though, this is also worth doing in case one of your reviewers has the same question...
It seems you have removed the genes with lower of 2.5 and upper from -2.5 log2FoldChange from data object.
If the genes were removed, then there would be no black line at the base. Actually, that is an option you can do to make a cleaner plot. You can remove those genes if you want.
Thanks for your feedback. yes I am also confused in this flat black line at the bottom. Any suggestions.
Well, I think that black line is just a whole lot of points that have a p-value of 1. Genes that are not significant. If you convert that to -Log10(pvalue), you get zero. When there are so many such points, and they span a range of log2FC values between approx. -2.5 to 2.5, they make the black line.
Thanks. Just want to double check that is this plot is fine in this way?