RepeatMasker UCSC track
1
0
Entering edit mode
22 hours ago

I am analysing repetitive sequences on transcriptomics data and I have been trying to use a repeatmasker annotation gtf file, but I realized that there are different "versions" of each id. For example, AluY (gene_id in repeatmasker annotation file) contain different sequences, as AluY_dup1, AluY_dup2, etc. These sequences are located in different regions of the genome and they have different lengths. Is this an error of the annotation file or there are repetitive sequences that can vary on length and locations but belong to the same "sequence group" on the repeatmasker annotation?

RepeatMasker • 118 views
ADD COMMENT
0
Entering edit mode
19 hours ago
Michael 55k

These sequences are located in different regions of the genome and they have different lengths.

Err, you are talking about repeats here, so there are necessarily multiple copies of the repeat genes.

Is this an error of the annotation file or there are repetitive sequences that can vary on length and locations but belong to the same "sequence group" on the repeatmasker annotation?

No, this is not an error. It is the essential property of transposable elements (TE). Active TEs can "jump around" or "hitch-hike" in the genome, changing their copy number, locations, as well as possibly their length e.g. by incomplete integrations, but large portions of the transposome are inactive and sequence can degenerate by random mutations. They can also be removed in part or as a whole by e.g. gene conversion.

The "sequence groups" are called repeat families, such as Alu repeats, Tc1/Mariner, etc.

ADD COMMENT

Login before adding your answer.

Traffic: 2335 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6