Question

Retrieving domain loci

0

Entering edit mode

5.9 years ago

jcthomas000 • 0

I've got a list of 200 Ensembl gene IDs and I want to get the start and stop nucleotides for known conserved domains. Going to https://www.ncbi.nlm.nih.gov/gene/2475, in the refSeq section there's a nice little summary of the conserved domains, but I cannot yet figure out how to get this information programmatically with Entrez, or find another database to acquire it from. The full protein record lists every repeat, every interaction site as a separate Region, which isn't useful. I'm pretty sure I should be querying the "cdd" database but I can't find any useful documentation for that.

Anyone done this before?

Cheers.

entrez database protein domains • 1.1k views

ADD COMMENT • link updated 5.9 years ago by vkkodali_ncbi ★ 3.8k • written 5.9 years ago by jcthomas000 • 0

0

Entering edit mode

Do you want the relative coordinates? Or do you want them mapped to the reference genome?

ADD REPLY • link 5.9 years ago by benformatics 4.0k

1

Entering edit mode

5.9 years ago

vkkodali_ncbi ★ 3.8k

Conserved domains are annotated on the RefSeq proteins. If you are starting with NCBI GeneIDs, then you may want to first fetch the proteins annotated on that gene and then extract the CDD domains for each protein. You can use Entrez Direct for this as follows:

elink -db gene -target protein -name gene_protein_refseq -id 2475 \
  | efetch -format gpc \
  | xtract -insd Region INSDInterval_from INSDInterval_to region_name note db_xref \
  | grep 'CDD:'           
NP_004949.1     363     2549    TEL1            Phosphatidylinositol kinase or protein kinase, PI-3 family [Signal transduction mechanisms]; COG5032    CDD:227365
NP_004949.1     655     681     HEAT repeat     HEAT repeat [structural motif]  CDD:293787
NP_004949.1     691     721     HEAT repeat     HEAT repeat [structural motif]  CDD:293787
...

ADD COMMENT • link 5.9 years ago by vkkodali_ncbi ★ 3.8k

score 0 · Accepted Answer · 2019-01-07

0

Entering edit mode

5.9 years ago

jcthomas000 • 0

Figured it out! You want to use Biomart. You can query their database using a list of IDs and specify what info you want returned (including domain position info) by clicking on "Attributes".

ADD COMMENT • link 5.9 years ago by jcthomas000 • 0