Hello,
Would anyone possibly know of an algorithm that can map NDC drug codes to their corresponding PubChem ids or an intermediate identifier? Trying to avoid the use of text mining algorithms as much as possible.
Hello,
Would anyone possibly know of an algorithm that can map NDC drug codes to their corresponding PubChem ids or an intermediate identifier? Trying to avoid the use of text mining algorithms as much as possible.
This is quite straightforward using the Cactvs Cheminformatics toolkit (visit www.xemistry.com/academic for free academic downloads), though it does involve not-quite-foolproof Internet-based compound name resolution:
a) Download the FDA NDC database zip file and expand. It contains a file 'product.txt' with the relevant data.
b) Use one of the toolkit interpreters to either run a simple Tcl script
table dictloop [table read product.txt] row {
set ndc [dict get $row PRODUCTNDC]
puts -nonewline "$ndc\t"
set d [dict create]
foreach s [split [dict get $row SUBSTANCENAME] \;] {
set s [string trim $s]
if {[info exists resolved($s)]} {
dict append d $s $resolved($s)
} elseif {[info exists unresolved($s)] || [catch {ens create $s} eh]} {
puts stderr "failed to resolve substance name $s"
set unresolved($s) 1
} else {
if {[catch {ens get $eh E_CID} cid]} {
puts stderr "no PubChem CID for substance $s"
set unresolved($s) 1
} else {
dict append d $s $cid
set resolved($s) $cid
}
ens delete $eh
}
}
puts $d
}
c) or a Python3 script
t=Table.Read('product.txt')
t.iteratorstyle = 'dict';
resolved={}
unresolved={}
for row in t:
ndc = row['PRODUCTNDC']
print(ndc,'\t',end='')
d={}
for s in [w.strip() for w in row['SUBSTANCENAME'].split(';')]:
if s in resolved:
d[s] = resolved[s]
elif s in unresolved:
print('failed to resolve substance name',s,file=sys.stderr)
unresolved[s] = True
else:
try:
e=Ens(s)
try:
d[s] = resolved[s] = e.E_CID
except:
print('no PubChem CID for',s,file=sys.stderr)
unresolved[s] = True
finally:
e.delete()
except:
print('failed to resolve substance name',s,file=sys.stderr)
unresolved[s] = True
print(d)
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
I am curious as to what you perceive as the utility for this particular mapping?