Question

RepeatMasker UCSC track

0

Entering edit mode

4 months ago

frarodmar17 • 0

I am analysing repetitive sequences on transcriptomics data and I have been trying to use a repeatmasker annotation gtf file, but I realized that there are different "versions" of each id. For example, AluY (gene_id in repeatmasker annotation file) contain different sequences, as AluY_dup1, AluY_dup2, etc. These sequences are located in different regions of the genome and they have different lengths. Is this an error of the annotation file or there are repetitive sequences that can vary on length and locations but belong to the same "sequence group" on the repeatmasker annotation?

RepeatMasker • 512 views

ADD COMMENT • link updated 4 months ago by Michael 56k • written 4 months ago by frarodmar17 • 0

score 0 · Answer 1 · 2025-03-14

These sequences are located in different regions of the genome and they have different lengths.

Err, you are talking about repeats here, so there are necessarily multiple copies of the repeat genes.

Is this an error of the annotation file or there are repetitive sequences that can vary on length and locations but belong to the same "sequence group" on the repeatmasker annotation?

No, this is not an error. It is the essential property of transposable elements (TE). Active TEs can "jump around" or "hitch-hike" in the genome, changing their copy number, locations, as well as possibly their length e.g. by incomplete integrations, but large portions of the transposome are inactive and sequence can degenerate by random mutations. They can also be removed in part or as a whole by e.g. gene conversion.

The "sequence groups" are called repeat families, such as Alu repeats, Tc1/Mariner, etc.