Hello everyone, I am trying to analyze repeat sequences in a non model organism (insect) by using repeatmodeler and repeat masker. I first began by building repeat models using repeatmodeler by first creating a database of the species (species.DB)
BuildDatabase -name species.DB -engine NCBI species.fa
Then used the following line of code:
RepeatModeler -database species.DB -engine ncbi -pa 16
I believe that the next step is using one of the output files (consensi.fa.classified) as the repeat library in repeatmasker. However, none of the repeat library file outputs from repeat modeler have the word 'classified' in it. I only have consensi.fa file. I therefore used it in repeatmasker and all the repeat sequences identified are listed under "unclassified" as shown below. What am I missing? Is there a step that I should do to get a classified list of the repeat sequences? Also, any reason why it assumes the query sequence is "homo" I'm assuiming homo sapiens? Any suggestions will be highly appreciated. Thank you.
==================================================
file name: species.fa
sequences: 107111
total length: 279238173 bp (275719484 bp excl N/X-runs)
GC level: 27.25 %
bases masked: 78241030 bp ( 28.02 %)
==================================================
number of length percentage
elements* occupied of sequence
--------------------------------------------------
SINEs: 0 0 bp 0.00 %
ALUs 0 0 bp 0.00 %
MIRs 0 0 bp 0.00 %
LINEs: 0 0 bp 0.00 %
LINE1 0 0 bp 0.00 %
LINE2 0 0 bp 0.00 %
L3/CR1 0 0 bp 0.00 %
LTR elements: 0 0 bp 0.00 %
ERVL 0 0 bp 0.00 %
ERVL-MaLRs 0 0 bp 0.00 %
ERV_classI 0 0 bp 0.00 %
ERV_classII 0 0 bp 0.00 %
DNA elements: 0 0 bp 0.00 %
hAT-Charlie 0 0 bp 0.00 %
TcMar-Tigger 0 0 bp 0.00 %
Unclassified: 656772 85428964 bp 30.59 %
Total interspersed repeats: 85428964 bp 30.59 %
Small RNA: 0 0 bp 0.00 %
Satellites: 0 0 bp 0.00 %
Simple repeats: 245893 10827053 bp 3.88 %
Low complexity: 0 0 bp 0.00 %
==================================================
* most repeats fragmented by insertions or deletions
have been counted as one element
The query species was assumed to be homo
RepeatMasker Combined Database: Dfam_Consensus-20171107, RepBase-20170127