As part of a data-supported druggable proteome analysis (i.e. an update to https://www.slideshare.net/cdsouthan/update-on-the-druggable-proteome ) I'd like to pull lists of PubChem BioAssay targets (e.g. by Gene ID) and then cross-map to UniProt IDs. As can be seen on slide 8 this works well for the other 4 databases indexed in the UniProt Chemistry x-refs.
The PubChem stats seem all over the place for this e.g. with Protein Targets:10,857 and Gene Targets 22,106 (CTD bloat for the latter?)
NOBA pcassay_protein_target[filt] gave 160112, nominally collapsing to 17,895 Protein Targets, but then transforming to only 800 gene entries
However, seeing as BioAssay has circularity with ChEMBL (and their target x-refs are in UniProt) what I really want is the protein mappings that are not originating with ChEMBL
Rather than get into the technicalities off the bat has anyone actually done this already? If I can get their recent UniProt list I'd be please to acknowledge them on the slide