I am trying to sort the fasta sequence of residues in transmembrane domains from left to right by extracellular to intracellular residues rather than N terminal to C terminal. Currently I am isolating the domains using a similar biopython script to this post.
Originally I was writing a script that could detect extracellular location compared to Cytoplasmic location. However in the text files these appear to use the same ID (sample below; ECO:0000255).
FT CHAIN 1 964 Ankyrin repeat and LEM domain-containing
FT protein 2.
FT /FTId=PRO_0000280243.
FT TOPO_DOM 1 7 Extracellular. {ECO:0000255}.
FT TRANSMEM 8 28 Helical; Signal-anchor for type III
FT membrane protein. {ECO:0000255}.
FT TOPO_DOM 29 964 Cytoplasmic. {ECO:0000255}.
FT DOMAIN 71 115 LEM. {ECO:0000255|PROSITE-
FT ProRule:PRU00313}.
Are there any softwares or some programatic way (preferably a module in biopython, perhaps there is a method in SeqIO that I have missed) that can pull the I/O direction from this type of uniprot text file? What might the annotation be in the text file?
The text between
{}
are ECO (evidence code ontology codes) in this case a "match to sequence model evidence used in manual assertion"