Pretty much all gene ontology analysis works the same way and answers the same question - for a given GO term, is your list of genes enriched for association with that term at a significantly higher rate than background? Your background could be all genes or a subset of genes depending on what exactly you're trying to determine. So for example, let's take your nucleic acid binding term in humans. There are roughly ~4000 protein coding genes associated with the term, a frequency of 20% if we round the number of protein coding genes to 20,000.
Let's say you have a list of 100 genes that are down-regulated after treatment with a compound, and you want to determine what biological processes may be disrupted. The analysis will go through all the terms and compare the frequency of genes associated with that term between lists to calculate a p-value (or q-value or FDR or whatever). Let's say 80 of your genes are associated with the nucleic acid binding term, which would show an increase in frequency that's very unlikely due to chance. The GO analysis would spit out that term as significantly enriched in your test set.
However, the background set is important. Including all genes in it makes a lot of assumptions, especially considering not all genes are expressed in all tissues or conditions. For example, if you're working in T cells and using all genes as your background set, almost any test set you put in will be enriched for T cell-related terms. But if you limit your background set to only genes that are expressed (say that limit cuts down the background set to ~12000 genes), your background frequencies are going to be much different, especially for cell/tissue-type specific terms. Typically, you'll want to at least exclude genes that are not expressed from your background set to capture a better snapshot of the "normal" state of the cell. This is one of the most overlooked concepts in GO analysis and renders a lot of the analyses seen in published papers relatively meaningless.
As for your actual question, it really means neither of those things, it merely reflects the confidence in differences in frequencies between gene sets as described above. GO has very little to do directly with research evidence, though the terms assigned to a given gene may be derived from it. Many of the associations are also inferred from things like protein structure/domains. Lastly, GO terms can be hilariously broad and nearly useless at times - I think this may be one such case, as "nucleic acid binding" and "cytoplasm" yield little info as to biological function. Indeed, many broad categories like that are just umbrella terms that have many, many additional child terms under them.
One last thing, DAVID is probably one of the most unwieldy GO analysis tools out there now. There was a time where it was one of very few options, but it's now very outdated, in my opinion. This is definitely subjective, but I find tools like enrichR and clusterProfiler to be much more attractive options.
Just approaching this from a logical point of view, it seems to me that your input, a gene list, is being compared to a cluster of gene lists each of which are associated with a GO component. Would that not mean that the scores are essentially a measure of the overlap between the two lists? Also, I think inter-component (BP/CC/MF) significance values should not be comparable, but I might be mistaken - there might be a normalization step to the scoring process. However, would you not want the most probable CC, MF and BP components? Why compare between them if that's the case?
I am just getting a list of all the gene ontology terms given out by DAVID above a certain significance threshold. Of course comparing between significance values is not what I want to do or plan to do but this question just struck me when I saw the list of ontology significance terms.