Within NCBI, the Protein Clusters database provides an FTP server in https://www.ncbi.nlm.nih.gov/proteinclusters/faq/ with text files containing protein clusters. The PCLA cluster for prokaryotic genomes, which is the one I'm interested in, was updated for the last time in 2017. I tried to look for newer versions, but without success. So I wonder if today people are accessing this database in a different manner (through an API maybe?), or have they not updated this database for the last 6 years?
Would you be able to enlighten me on how can I access the newest data of protein clusters? Just to give some context, I have a bunch of GFF files with annotated genes that I'm mapping to protein families through a pipeline. But I think I can only use genomes that were added to NCBI up to 2017, and I would like to be able to use newer genomes.
There does not seem to be a new version of the protein clusters after 2017. As you recall whole genomes have exploded on NCBI and it may simply be impractical to do this regularly.
That said, please email NCBI Help desk and ask if this project is active or deprecated. Post their response here.