Meaning of the 5th column in repeatmasker BED format results
1
1
Entering edit mode
5.9 years ago
Vitis ★ 2.6k

I'd like to ask about the meaning of the 5th column (an integer number) in BED format results from repeatmasker.

9   100663131   100663387   LTR22_SS    475 +
9   70254161    70254685    ALTR2B_SSc  3460    +
9   96468811    96469391    LTR8_SSc    3756    -
9   116391614   116392469   LTR78   1152    +
9   4980341 4980930 LTR39_SSc   3300    -
9   16908116    16908359    MamGypLTR3  512 +
9   17432426    17432886    ALTR2B2_SSc 1914    -
9   18742941    18743430    LTR5_SS 2771    +
9   27131556    27132076    ERV3-1_SSc-LTR  969 -
9   30539515    30539909    LTR39B3_SSc 787 -

Searched around but did not find a definite answer, so seeking help here from repeatmasker experts. It is clearly not length of the repeat. Is it something indicating how repetitive this feature is in the genome? Or some identity scores indicating its match to some sort of repeat consensus?

genome • 2.3k views
ADD COMMENT
0
Entering edit mode

Column 5 is an optional field for BED files. The description of field 5 is below, from UCSC

5. Score - A score between 0 and 1000. If the track line useScore attribute is set to 1 for this annotation data set, the score value will determine the level of gray in which this feature is displayed (higher numbers = darker gray). This table shows the Genome Browser's translation of BED score values into shades of gray:

You may have already known that... Not sure how they determine the scores here, sorry.

ADD REPLY
2
Entering edit mode
5.9 years ago

It is the Smith-Waterman alignment score of matches with coding sequence.

I think (but am not 100% certain) a higher score means greater similarity with coding sequence, indicating a less repetitive region. It is used for cutoff filters, which are specific to different classes of repeats.

I don't know how useful it is to use these scores directly. Also see "How to read the results" from the Repeatmasker documentation:

Smith-Waterman score of the match, usually complexity adjusted The SW scores are not always directly comparable. Sometimes the complexity adjustment has been turned off, and a variety of scoring-matrices are used.

It may be worthwhile to contact the developers directly.

ADD COMMENT

Login before adding your answer.

Traffic: 2016 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6