Since transposable elements move around in a genome, how exactly is their position represented in a genome sequence?
Since transposable elements move around in a genome, how exactly is their position represented in a genome sequence?
I'm not sure if I fully understand your question, but I'll give it a shot:
Actually most transposable elements are not capable anymore of moving around in the genome. Only some Alu, L1, SVA and (HERV?) families are capable of transposing in the human genome. The rest of the transposable elements (TEs) more or less have a fixed position in the genome (if you're not considering genomic rearrangement events etc.) at least since the human-chimp divergence.
For the active families, only L1 is an autonomous TE with coding parts for endonuclease activity and reverse transcription activity. These proteins form a complex to which L1 mRNA can bind and can insert in a knicked DNA site and copied back to DNA during insertion. This process is not always very successful, leaving truncated L1 sequences into the donor site, which are not capable of transposing anymore. SVA and Alu do not have coding parts but can get inserted into other places in the DNA through this L1 machinery. So even most element belonging to the active "L1" family aren't even capable of transposing anymore.
Elements which are capable of transposing are mostly expressed at the germline. So for one individual, even if transposition has occurred for some elements, the genomic position is fixed for at least most of the cells. In cancer tissue TEs can get expressed again and can cause transposition into genes which enhance the carcinogenesis. In healthy tissue, to my knowledge, only in neuronal tissue it has been shown that somatic expression of TEs can occur. In this case the genomic position is not always the same for a TE in each neuronal cell. But in general, I believe its safe to say that for one individual the TEs have a fixed position in a genome sequence.
Between individuals ofcourse the genomic position is not the same for a small subset of TEs. This variation is documented in the database for mobile element insertion polymorphisms, called dbRIP (http://dbrip.brocku.ca/).
The way in which transposable elements (TEs) are represented (i.e. annotated) depends first on whether you are asking about (i) TEs that are present in the reference sequence or (ii) TEs that are not present in the reference sequence (i.e. identified in another strain/species) whose location has been mapped to reference coordinates.
For reference TEs, TEs are annotated like any other annotated feature, i.e. using a range of base or interbase coordinates depending on the annotation model. Sometimes fragments of a reference TE can be linked together using the same ID to represent an ancestral TE insertion that has been fragmented by divergence or insertion of another TE. Defragmented TEs can thus be viewed like exons of a gene model.
For non-reference TEs, things are much trickier. As we detail in this poster, different tools for identifying non-refernce TEs use different annotation schemes. Some annotate TEs to a single base pair, some annotate to a span representing uncertainty in location, some represent TEs as a point (base, interbase) between the ranges of the mapping uncertainty, others represent TEs as the range of the target site duplication (TSD). As yet, there is no consensus in the field how to represent non-refernece TE insertions.
As I detail in this article, there is a problem with annotating non-reference TEs as a single point, which arises from (i) the fact that TEs (like all insertions) are biologically inter-base features and (ii) TEs often generate a TSD upon insertion, which can lead to ambiguity about where to place the TE insertion in the genome when mapping them from the 5' vs the 3' end. The combination of (i) and (ii) makes it very difficult to precisely annotate TEs on base coordinate systems to a single nucleotide position.
Based on the fact that most genome annotation systems use base coordinates, I advocate that non-reference TEs be annotated as the span of the sequence in the reference genome that becomes the TSD (i.e. the pre-TSD sequence) in the sample containing the TE, with the orientiation of the TE represented in the strand field. This is how one of the top-performing non-reference TE mappers RelocaTE does things, and I hope becomes a standard in the field.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
True, however for some species (e.g. Maize) transposable elements are still actively transposing sometimes. To answer the OP question, there is no GFF3 convention - as far as I know - to indicate elements which are (or can be) still activelly transposing.