Entering edit mode
8.6 years ago
Harshal
▴
60
I have a bunch of refseq protein accessions around (~30K) and would like to extract the conserved domains from CDD and pfam . I used the following script
https://www.ncbi.nlm.nih.gov/Structure/cdd/cdd_help.shtml#BatchRPSBWebAPI_samplePERLscript
to extract the data through API but its taking close to an hour for a single accession. Is there a quicker way to extract the information , like flat files which already have the cdd and pfam data mapped to refseq ids ?
Some examples of refseq ids:
NP_000005,
NP_000006
NP_000213,
NP_002975,
NP_005219