Question

How to find transmembrane domain topological direction (extracellular/intracellular) from a uniprot text file?

0

Entering edit mode

10.4 years ago

Good Gravy ▴ 20

I am trying to sort the fasta sequence of residues in transmembrane domains from left to right by extracellular to intracellular residues rather than N terminal to C terminal. Currently I am isolating the domains using a similar biopython script to this post.

Originally I was writing a script that could detect extracellular location compared to Cytoplasmic location. However in the text files these appear to use the same ID (sample below; ECO:0000255).

FT   CHAIN         1    964       Ankyrin repeat and LEM domain-containing
FT                                protein 2.
FT                                /FTId=PRO_0000280243.
FT   TOPO_DOM      1      7       Extracellular. {ECO:0000255}.
FT   TRANSMEM      8     28       Helical; Signal-anchor for type III
FT                                membrane protein. {ECO:0000255}.
FT   TOPO_DOM     29    964       Cytoplasmic. {ECO:0000255}.
FT   DOMAIN       71    115       LEM. {ECO:0000255|PROSITE-
FT                                ProRule:PRU00313}.

Are there any softwares or some programatic way (preferably a module in biopython, perhaps there is a method in SeqIO that I have missed) that can pull the I/O direction from this type of uniprot text file? What might the annotation be in the text file?

biopython sequence uniprot • 3.9k views

ADD COMMENT • link updated 3.2 years ago by Ram 45k • written 10.4 years ago by Good Gravy ▴ 20

0

Entering edit mode

The text between {} are ECO (evidence code ontology codes) in this case a "match to sequence model evidence used in manual assertion"

ADD REPLY • link updated 3.2 years ago by Ram 45k • written 10.4 years ago by me ▴ 760

Ram · Answer 1 · 2015-02-24

I think you would need to look at both the TOPO_DOM features (where they label residues as extracellular or cytoplasmic) to infer which direction each TRANSMEM feature runs (into the cell, or out of the cell).

i.e. Extracellular TOPO_DOM, then TRANSMEM, then cytoplasmic TOPO_DOM as in the quoted example means the transmembrane domain runs (N terminal to C terminal) from outside the cell to inside the cell.

As noted by user "@me" the identifiers in the curly brackets are evidence codes, not identifiers for each feature.