Hey, I have been using Gene Ontology to try to understand some differently express gene and I am now wondering if I should trust the GO annotation.
I was looking at serpinb2, for this gene I am getting GO term: GO:0005576 'extracellular region' and GO:0005615 'extracellular space'. However when I cross reference with Protein atlas, which have antibodies staining assays, then it seems to be clearly inside the cell around the golgi apparatus.
I have been trying to get the reference of those go annotation without success. Therefore which one should be trust, annotation or antibodies staining ?
As an aside, when you have discrepancies between two data sources, you should question both data sources. Here, you imply that not seeing the protein in the extracellular space in the HPA antibody staining means the GO annotation is likely wrong. The problem is that the antibody staining may be done under fixation conditions that do not preserve extracellular proteins and so the HPA can't report on them. Absence of evidence is not evidence of absence. When dealing with experimental data, you need to make sure you understand their limitations before reaching conclusions.
Well I may have add that I looked at mass spec data of cell surface for many cell lines, and didn't find much serpinb2, also for some reason the source of the go annotation can not be verified, the link does not work for me. I am not saying the annotation if false, but in this case I am more likely to discard it until I can verify the source.
As far as I could figure it out, one annotation comes from the Reactome database which is largely manually curated, the other from the Panther database annotation of the serpin family. On the other hand, experimental evidence is often conflicted and a lot also depends on context which current GO annotations don't capture, e.g. maybe SERPINB2 is extracellular in certain cell types or conditions and this may or may not be relevant to your work. In this case, SERPINB2 is a well known secreted protein (its synonym is plasminogen activator inhibitor 2) but only in response to some signal, see for example this paper. My point is that in the end, trusting a data set is a subjective matter. However, for statistical data analysis, the handling has to be consistent. It is fine to go gene by gene and review the evidence but when playing this game, one often needs to go deep in the literature to form an opinion.