I just look through the statistics for the GENCODE : http://www.gencodegenes.org/archive_stats.html
And I noticed that the number of the protein coding gene is one the decrease .
very interesting and really puzzle me .
Version 21 (June 2014 freeze, GRCh38) - Ensembl 77 --> 19881
Version 20 (April 2014 freeze, GRCh38) - Ensembl 76 --> 19942
Version 19 (July 2013 freeze, GRCh37) - Ensembl 74 --> 20345
Version 18 (April 2013 freeze, GRCh37) - Ensembl 73 --> 20318
So, Is it not clear to judge a gene to be a protein coding or not ?
Is there will be any tiny different among individuals ?
BTW, From HGNC, there are just 19003 protein coding genes :http://www.genenames.org/cgi-bin/statistics
As Denise wrote, there are many reasons for the changing number of genes. This is all down to different assemblies and different amount and type of information used for the annotation process. Also an important source of variation between resources is that different resources have different definitions of what a gene is.
I have heard from my HGNC colleagues. They list the genes that have got approved symbols only. This could explain some of the the discrepancy. I was told it's not uncommon to have discrepancies between different resources.
Thank you very much. I see.