Dear all:
I plan to do comparison analysis on repeat elements among several rodents. For some rodents (e.g., mouse, rat) their annotation information are directly available from RepeatMasker website (e.g., for mouse http://www.repeatmasker.org/species/musMus.html). But for the other rodents we can not get such annotation and I have to run repeatmasker by myself. I have some concerns about how to do the comparison and have two backup choices. Does anyone can give me some suggestions on that? Thanks
Choice 1: To maker sure all the annotations are achieved under the same condition (e.g., the repeat library, the parameters), I can do all the repeat annotation by myself and then do comparison.
Choice 2: For mouse and rat I use the annotation file from RepeatMasker website because such annotation should be standard. For other rodents I will try to annotate them as good as possible (for example, predict species-specific repeat elements). And then compare.
Which one do you think make more sense?
By the way, is there some package (codes) that can help to analyze the annotation file (see example below), and get the percentage of each kind of repeat element in the genome?
more mm10.fa.out
SW perc perc perc query position in query matching repeat position in repeat
score div. del. ins. sequence begin end (left) repeat class/family begin end (left) ID
14737 8.1 1.0 0.2 chr1 3000001 3000097 (192471874) C L1MdFanc_I LINE/L1 (2987) 3586 3489 1
27 0.0 0.0 0.0 chr1 3000098 3000123 (192471848) + (T)n Simple_repeat 1 26 (0) 2
14737 8.1 1.0 0.2 chr1 3000124 3002128 (192469843) C L1MdFanc_I LINE/L1 (3085) 3488 1467 1
Hi! The example file is hard to read for me. Maybe you can format it somehow? Regarding your choices, I suggest doing a mix. Doing everything with the same approach is suggested, otherwise you will never knwo if the differences are real or due to different analysis options. However, assuming that the available annotation can be considered a gold standard, and that your data should theoretically enable you to obtain such a good annotation, you can perform the masking by yourself on all organisms, and use mouse and rat available data to find the optimal parameters of the masking.