Entering edit mode
11.5 years ago
AndreiR
▴
260
Hello,
Im trying to reproduce some results observed at UCSC genome browser on repeats. I was careful about versions of RepeatMasker and RepBase. I understand that UCSC runs RepeatMasker with -s. However when I retrieve DNA sequence from UCSC window and run RepeatMasker on it, I cant find same events that are showed. I was wondering if the RepeatMasker engine may be an issue. Im using NCBI/RMBLAST [ 2.2.27+ ].
Thanks in advance
It is very difficult to duplicate RepeatMasker results between runs. Independent runs produce similar results, but rarely are they identical. Are you using the standard RepeatMasker libraries or are you using sequences from RepBase?
Im using the latest version of RepeatMasker and latest version of RepBase. Actually Ive tried to use the versions of RM and RepBase stated on GoldenPath to gets closer but no success.
Is RM failing to recovering the expected repeats, or are they being misidentified. For example, are you expecting a repeat to be an ALU SINE (based on the UCSC annotation), but RM calls it a MIR, or are your RM runs not annotating the region at all?
The first, Im expecting something and get another.
Are you supplying the library using the '-lib' option or are you using the '-species' option. Do you mind sharing the annotation you are expecting versus what RM is giving you (just the repeat type)? If your annotation differs at or below the subfamily level, there may not be much you can do about it. It could be that RM is unable to determine if the repeats in question come from subfamily A vs subfamily B.
Have you tried looking at the alignment manually. It may be worth comparing one of your misidentified repeats with what UCSC has vs. your RM run. That may give you some sort of clarification.
Thanks for your ideas. Im supplying the library. In some examples, I found the use of qq and -s change the results. Im working on it to see if that justify all my problems. :)
Where did you get this information?