There are 19035 protein-coding rows in the HGNC download but the UniProt 19035 column collapses to 18883 infering 432 one-to-many Swiss-Prot > HGNC
However, when I query UniProt with database:(type:hgnc) AND reviewed:yes AND organism:"Homo sapiens (Human) [9606]" I get 19960 from the 20,168, implying 905 for the same 1:many - but I can only find 152 duplicates in the column
Can amyone whos been doing something similar help out here? (note it falls between two help desks)
After some hours of head scratching, cross checking and making Venn intersects (see twitter) I think I have an explanation. So no one needs to dive into this if they have better things to do, but I will hold off on my conclusions for a time just to see if anyone wants to come up with an independently corroborative explanation (which I actually think is important for the domain of protein annotation)
Thanks for all the comments, I managed the review in the end "Last rolls of the yoyo: Assessing the human canonical protein count [version 1; referees: awaiting peer review]" https://f1000research.com/articles/6-448/v1 feedback welcome