I received comments from reviewers on a manuscript. One comment asks how gene length was accounted for in the GO enrichment analyses.
I understand that differential expression RNAseq data can be affected by gene length bias, which can be passed on to GO enrichment analyses. However, I have conducted a genome-wide scan for selection using genome-wide DNA resequencing data. We identified outlier genes putatively under selection using multiple population genomic metrics using sliding genomic windows across each discrete populations (Tajima's D, NCD, Pi, etc).
Do I need to account for gene length in my GO analyses? I can't find any literature on gene length bias in GO analyses on DNA data where we are not dealing with read counts.
I don't think so. My logic is there is that it is unimportant if we find multiple signals in longer genes since it's whether a gene is an outlier is a boolean value. Are we more likely to find a signal of selection in longer genes due to drift? Maybe, but we have neutral simulations to show that our thresholds didn't identify any neutral signals in silico.