Get protein domain information from gene name in R
1
2
Entering edit mode
7.6 years ago
asd ▴ 20

I would like to get the protein domain name, start and end of a gene by its name in R. A Web API is also acceptable.

My goal is to plot DNA mutations on protein domain level, like the cBioPortal MutationMapper, but I would like to do it programmatically in R. I know that this information available in the Pfam database, but I don't know how to get that data.

I have read the previous posts in similar topics, but I didn't find a solution. Thank you for help!

R package protein • 4.7k views
ADD COMMENT
1
Entering edit mode
7.6 years ago

You can do this using EnsEMBL. Use either the BioMart interface or the perl API.

EDIT: Forgot the R bit: there's the bioMaRt bioconductor package.

ADD COMMENT
0
Entering edit mode

Thank you, bioMart returns the required results, but it contains too much row and not just those, which annotated as 'Pfam' and 'low_complexity' on Pfam website.

How can I annotate it with this source and domain column?

ADD REPLY
0
Entering edit mode

EnsEMBL bioMart's HTML looks buggy: results are returned per transcript, even if you haven't selected the transcript IDs to be returned and even if you request unique results only. However, exporting unique results as tsv file seems to work as expected.

ADD REPLY
0
Entering edit mode

For TP53 the bioMart unique tsv contains 17 row but the Pfam website just 13. BioMart has domain from 1 to 156, Pfam has 1 to 23.

Why is this difference?

ADD REPLY
0
Entering edit mode

It looks like the unique results in the tsv file still contain results corresponding to different transcripts and so likely slightly different proteins. Since you want to locate mutations relative to protein domains, you should anyway consider all proteins produced by a given gene. Note that Pfam has no notion of genes or of underlying genome, it just annotates proteins from UniProt, usually only the canonical sequence, not the variants whereas EnsEMBL does annotate all proteins.

ADD REPLY

Login before adding your answer.

Traffic: 1664 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6