Feature type locations overlap in Biopython. Which number is correct?
2
1
Entering edit mode
9.6 years ago
Good Gravy ▴ 20

In biopython feature.location.end can be equal to next_feature.location.start. For example:

type: TRANSMEM
location: [187:208]
qualifiers:
    Key: description, Value: Helical. {ECO:0000255}.
type: TOPO_DOM
location: [208:411]
qualifiers:
    Key: description, Value: Extracellular. {ECO:0000255}.

Although there is some biological ambiguity over this example (residue 208), in others there is not. Hence I ask which domain do the residues that are overlapped truly belong to?

python biopython • 2.1k views
ADD COMMENT
3
Entering edit mode
9.6 years ago
Peter 6.0k

Devon is right that a protein might well be annotated with overlapping domains, however in this case the domains in your example do NOT overlap. Biopython uses Python style slicing notation, so [187:208] and [208:411] do NOT overlap. e.g.

>>> example = "0123456789"
>>> example[3:6]
'345'
>>> example[6:9]
'678'

Also beware that Biopython and Python use zero-based counting, rather than the one-based counting you may be more used to. Note SwissProt/UniProt annotation files use one-based counting in their plain text and XML file formats.

ADD COMMENT
0
Entering edit mode

Very informative, thanks. If I have understood you correctly, the only time this needs to be taken into account is when making the location integer into a human readable position, and is not a worry for the amino acids sequence? For example would there be an amino acid that would be incorrectly printed twice in print(TRANSMEM_domain.extract(record.seq), TOPO_DOM_domain.extract(record.seq))? The cookbook isn't very clear on this.

ADD REPLY
1
Entering edit mode

Yes, in this example you'd need to be careful about "position 208" (Python zero-based counting) versus "position 209" (more human-friendly one-based counting), which is the first amino acid in the TOPO_DOM feature.

The .extract(...) method knows about the slicing so would do the right thing.

ADD REPLY
1
Entering edit mode
9.6 years ago

Both, there's no reason that a given residue can't belong to more than one domain, particularly where they meet. A extracellular domain is a good example of that, since you could have helices/sheets that are extracellular (or intracellular for that matter).

In any case, this is less a biopython question than one for whomever did the annotation that you're looking at. Biopython is typically just parsing that and presenting it to you in a convenient manner.

ADD COMMENT

Login before adding your answer.

Traffic: 2677 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6