Entering edit mode
7.3 years ago
amy.bashir
▴
110
Hello everyone!
I am doing repeat elements annotation for a new genome. From what I read online and in papers, the work flow is 1) Dustmasker, 2) Trf, 3) RepeatModeler and 4) RepeatMasker.
I just finished masking the low complexity regions using Dustmasker. Should I use the hard masked file as input for Trf, or use the original genome sequence file as input? Is it ever a good idea to use a masked sequence as input for another repeat masking program?
Thank you very much!
RepeatModeler uses Tandem Repeat Finder. Why are you using it prior to RepeatModeler? But I would wouldn't mask my data. Masked data means that repetitive terms are hidden away.
I saw that RepeatModeler uses Trf, but I got 0% for "simple repeats" and "low complexity", so I wondered if I do Trf analysis separately, I might see something different.
I am doing the repeat element annotation to see what percentage of the genome is repeat sequences, not to mask the sequence.
Also, it seems that most of the new whole genome analysis papers that I have come across use both RepeatModeler+RepeatMasker and Trf, so I was wondering if they do different things.
i used following workflow to annotation of repetitive elements in my own work one new genome: