I have become interested in tracking down this and ran what the OP did. Basically I am trying to determine if this is a GFF file format error or a validation error.
I do get the same error as the OP:
gt gff3validator: error: CDS feature on line 621884 in file "GCF_000002035.5_GRCz10_genomic.gff" has the wrong phase 1 (should be 0
The obtain the line:
cat GCF_000002035.5_GRCz10_genomic.gff | awk ' NR==621884 { print $0 } '
This produces the offending line:
NC_007121.6 BestRefSeq CDS 21750127 21750129 . + 1 ID=cds20891;Parent=rna31380;Dbxref=GeneID:553997,Genbank:NP_001019280.1,ZFIN:ZDB-GENE-050609-5;Name=NP_001019280.1;Note=The RefSeq protein has 9 substitutions compared to this genomic sequence;exception=annotated by transcript or proteomic data;gbkey=CDS;gene=pcdh1g9;product=protocadherin 1 gamma 9;protein_id=NP_001019280.1
Looking at the genbank format at https://www.ncbi.nlm.nih.gov/nuccore/NC_007121.6 it shows:
join(21744267..21746679,21746681,21750127..21750129,
21750131..21750164,21855170..21855228,21858956..21859074,
21860119..21860155,21860973..21860979)
/gene="pcdh1g9"
/gene_synonym="DrPcdh1g8"
/inference="similar to AA sequence (same
species):RefSeq:NP_001019280.1"
/exception="annotated by transcript or proteomic data"
/note="The RefSeq protein has 9 substitutions compared to
this genomic sequence; Derived by automated computational
analysis using gene prediction method: BestRefSeq."
/codon_start=1
/product="protocadherin 1 gamma 9"
/protein_id="NP_001019280.1"
/db_xref="GI:66773380"
/db_xref="GeneID:553997"
/db_xref="ZFIN:ZDB-GENE-050609-5"
This shows that it is the third CDS that raises the error. Add up the lengths of the previous CDS sizes and see how far are we from the multiple of 3. That would be the phase.
>>> size = 21746679 - 21744267 + 1 + 1
>>> divmod(size, 3)
(804, 2)
The remainder is 2, this means that the next codon starts one base in. So phase should be 1.
Basically telling us that the GFF is correct and the validator is incorrect.
Why do you need to "validate" this data from NCBI? Is there a tool or analysis that is not working with these annotations?
FWIW I would hope that all GFF in refseq validates.
I would think so, and this is one of the primary model species used in biomedical research so I doubt there are any major issues. It seems rather academic to validate a file by one definition just for the sake of it.