Error in repeatmodeler consensi file output
1
0
Entering edit mode
6.3 years ago

Hello everyone, I am trying to analyze repeat sequences in a non model organism (insect) by using repeatmodeler and repeat masker. I first began by building repeat models using repeatmodeler by first creating a database of the species (species.DB)

BuildDatabase -name species.DB -engine NCBI species.fa

Then used the following line of code:

RepeatModeler -database species.DB -engine ncbi -pa 16

I believe that the next step is using one of the output files (consensi.fa.classified) as the repeat library in repeatmasker. However, none of the repeat library file outputs from repeat modeler have the word 'classified' in it. I only have consensi.fa file. I therefore used it in repeatmasker and all the repeat sequences identified are listed under "unclassified" as shown below. What am I missing? Is there a step that I should do to get a classified list of the repeat sequences? Also, any reason why it assumes the query sequence is "homo" I'm assuiming homo sapiens? Any suggestions will be highly appreciated. Thank you.

==================================================
file name: species.fa         
sequences:        107111
total length:  279238173 bp  (275719484 bp excl N/X-runs)
GC level:         27.25 %
bases masked:   78241030 bp ( 28.02 %)
==================================================
               number of      length   percentage
               elements*    occupied  of sequence
--------------------------------------------------
SINEs:                0            0 bp    0.00 %
      ALUs            0            0 bp    0.00 %
      MIRs            0            0 bp    0.00 %

LINEs:                0            0 bp    0.00 %
      LINE1           0            0 bp    0.00 %
      LINE2           0            0 bp    0.00 %
      L3/CR1          0            0 bp    0.00 %

LTR elements:         0            0 bp    0.00 %
      ERVL            0            0 bp    0.00 %
      ERVL-MaLRs      0            0 bp    0.00 %
      ERV_classI      0            0 bp    0.00 %
      ERV_classII     0            0 bp    0.00 %

DNA elements:         0            0 bp    0.00 %
     hAT-Charlie      0            0 bp    0.00 %
     TcMar-Tigger     0            0 bp    0.00 %

Unclassified:    656772     85428964 bp   30.59 %

Total interspersed repeats: 85428964 bp   30.59 %


Small RNA:            0            0 bp    0.00 %

Satellites:           0            0 bp    0.00 %
Simple repeats:  245893     10827053 bp    3.88 %
Low complexity:       0            0 bp    0.00 %
==================================================

* most repeats fragmented by insertions or deletions
  have been counted as one element


The query species was assumed to be homo          
RepeatMasker Combined Database: Dfam_Consensus-20171107, RepBase-20170127
repeat modeler repeat masker repeatmasker • 3.0k views
ADD COMMENT
0
Entering edit mode
5.6 years ago

Hello, this question was answered here: RepeatModeler GitHub

ADD COMMENT

Login before adding your answer.

Traffic: 1585 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6