I am unable to understand the actual meaning of GO functional annotation?
A dataset contains 3962 genes with **GO Functional Annotations** but the total number of genes are 6100.
My general thinking is 3962 genes are annotated as molecular function (F) by GO.
In the association file of GO, there contains GO-ID, gene name, and annotation type(cellular component(C), molecular function(F), biological process(P)) etc.
What is the meaning of the above line? How can I calculate these number of genes that are with GO Functional annotation by coding, not using any packages?
I would read it as:
Of your 6100 genes, a GO term (independent from annotation type) was assigned to 3962 genes.
A gene with GO term could also have GO terms of more than one annotation type.
3962 genes out of 6100 are annotated. The problem is the interpretation of "functional annotation" without context. I have seen this to mean any annotation in GO (i.e. across all three domains) and sometimes to mean biological function which would refer to the biological process domain. If you know that functional annotation refers to the molecular domain function then it means 3962 genes out of 6100 are annotated with GO terms from this domain.
How can I calculate these number of genes that are with GO Functional annotation by coding, not using any packages?
Simply look at the genes in your list that have molecular function annotations in the GO association file. Depending on your goal, you may want to count as unannotated genes only associated with non-informative high level terms.
I assume, Out of 6100 genes from your input, 3962 genes were associated with the GO functional terms.
I presume you already know about GeneSCF,
Gene Set Clustering based on Functional annotation (GeneSCF)
I would read it as: Of your 6100 genes, a GO term (independent from annotation type) was assigned to 3962 genes. A gene with GO term could also have GO terms of more than one annotation type.