I am trying to run RepeatMasker on an insect genome using the following parameters:
RepeatMasker -species Lepidoptera -nolow -dir RM_Lepidoptera_Intersped -cutoff 250 Genome.fasta
Unfortunately I get no transposon hits (only some rRNA class repeats), I think that this is due to RepeatMasker not using the RepBase library I downloaded, it gives the following message at the start of the analysis:
WARNING: Dfam 2.0 includes repeats found in human, mouse,
drosophila melanogaster, danio rerio and
caenorhabditis elegans. Searching with other species
will only search for ancestral repeats shared with
human and your species ( if any exist ) and will use the
"TC" cutoffs ( trusted cutoff ) instead of the
species-specific cutoffs.
Master RepeatMasker Database: /home/wolf/Desktop/Programs/RepeatMasker/Libraries/Dfam.hmm ( Complete Database: Dfam_2.0 )
When using the queryRepeatDatabase.pl utility with parameters:
./queryRepeatDatabase.pl -species Lepidoptera -stat -class DNA -class LTR
It returns a long list of transposons. I'd expect at least one to be present in the sequenced genome...?
So I tried to generate a custom library from data downloaded from RepBase (Hexapoda Transposable elements), but when using the -lib option like so:
RepeatMasker -cutoff 250 -dir RM_Lepidoptera_Intersped -lib RepBase_Hexapoda_TEs.fasta Genome.fasta
I get the error message:
Search Engine: HMMER [ 3.1b2 (February 2015) ]
RepeatMasker::createLib(): Error invoking /home/Programs/HMMER/binaries//hmmpress on file /home/3_Homology_based_approach/RM_31187.ThuNov301438412017/RepBase_Hexapoda_TEs.fasta.
I thought one could use a .fasta library? Or do I need to first convert it? if so how?