protein domain start-stop codons database
2
1
Entering edit mode
5.4 years ago
cocchi.e89 ▴ 290

I'm working on some variations (exome, human) and I got their domain feature with VEP.

e.g. of a result: Pfam_domain:PF01762&hmmpanther:PTHR11214&hmmpanther:PTHR11214:SF28&Low_complexity_(Seg):seg

I need to get back from this domain to its codons coordinates (e.g. PANTHER PTHR11214 start and stop codons(or also genetic position if possible)). Is there any DB to retrieve this information?

I found a similar post but it's made for manual retrieve each one, I need to automate the process.

Thanks a lot in advance for any help!

codon database domain vep • 1.0k views
ADD COMMENT
0
Entering edit mode

Getting to the actual domain position is not trivial. You can get gene names and coordinates using Entrezdirect and Pfam id's:

$ esearch -db cdd -query "PF01762" | elink -target gene | esummary | xtract -pattern DocumentSummary -if ScientificName -equals "Homo sapiens" -element Id,Name,ScientificName,ChrAccVer,ChrStart,ChrStop

You will get something like (truncated for space):

56913   C1GALT1 Homo sapiens    NC_000007.14    NC_000007.14    NC_000007.14    NC_000007.14    NC_018918.2     NC_000007.14    NC_018918.2     NC_000007.13    AC_000068.1     AC_000139.1     NC_018918.2   7182546 7182546 7182546 7182546 7182546 7222243 7182546 7222243 7222177 7269405 7070738 7222243 7248650 7248650 7248650 7248650 7288281 7248650 7288281 7288281 7335443 7136783 7288281
10317   B3GALT5 Homo sapiens    NC_000021.9     NC_000021.9     NC_000021.9     NC_000021.9     NC_018932.2     NC_000021.9     NC_018932.2     NC_000021.8     AC_000153.1     NC_018932.2     39612939      39612939        39612939        39612939        39612939        40545617        39612939        40545617        40928368        26454203        40545617        39673136        39673136     39662888 39662888        40595560        39662888
ADD REPLY
1
Entering edit mode
5.4 years ago
Emily 24k

The ones that the VEP has found will be the ones in the Ensembl database, so the way to get back those ones will be using BioMart.

ADD COMMENT
0
Entering edit mode
5.4 years ago
GenoMax 148k

You can use Entrezdirect to get this information.

$ esearch -db cdd -query "PF01762" | elink -target protein | esummary -format ft

Output should be something like (truncated). Region entry denotes the domain. AA position.:

>Feature ref|XP_021018358.1|
1       350     Protein
                        product N-acetyllactosaminide beta-1,3-N-acetylglucosaminyltransferase 4
106     295     Region
                        region  Galactosyl_T
                        note    Galactosyltransferase
                        db_xref CDD:328824
1       350     CDS
                        product N-acetyllactosaminide beta-1,3-N-acetylglucosaminyltransferase 4
                        protein_id      ref|XP_021018358.1|
                        db_xref GeneID:110294452

>Feature ref|XP_021015436.1|
1       325     Protein
                        product beta-1,3-galactosyltransferase 6
65      256     Region
                        region  Galactosyl_T
                        note    Galactosyltransferase
                        db_xref CDD:328824
1       325     CDS
                        product beta-1,3-galactosyltransferase 6
                        protein_id      ref|XP_021015436.1|
                        db_xref GeneID:110292468
ADD COMMENT

Login before adding your answer.

Traffic: 3000 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6