repeat masker output looks weird: 0% for all of repeat squences
1
0
Entering edit mode
3.8 years ago
slin023 • 0

Greetings, I tried to use RepeatMasker to identify repeat sequences, based on the log file, it looks like completed, but the overall result looks weird to me

==================================================
file name: asm.contigs.filtered.fasta
sequences:          1499
total length:  638024656 bp  (638024656 bp excl N/X-runs)
GC level:         27.03 %
bases masked:   36845866 bp ( 5.77 %)
==================================================
               number of      length   percentage
               elements*    occupied  of sequence
--------------------------------------------------
Retroelements            0            0 bp    0.00 %
   SINEs:                0            0 bp    0.00 %
   Penelope              0            0 bp    0.00 %
   LINEs:                0            0 bp    0.00 %
    CRE/SLACS            0            0 bp    0.00 %
     L2/CR1/Rex          0            0 bp    0.00 %
     R1/LOA/Jockey       0            0 bp    0.00 %
     R2/R4/NeSL          0            0 bp    0.00 %
     RTE/Bov-B           0            0 bp    0.00 %
     L1/CIN4             0            0 bp    0.00 %
   LTR elements:         0            0 bp    0.00 %
     BEL/Pao             0            0 bp    0.00 %
     Ty1/Copia           0            0 bp    0.00 %
     Gypsy/DIRS1         0            0 bp    0.00 %
       Retroviral        0            0 bp    0.00 %

DNA transposons          0            0 bp    0.00 %
   hobo-Activator        0            0 bp    0.00 %
   Tc1-IS630-Pogo        0            0 bp    0.00 %
   En-Spm                0            0 bp    0.00 %
   MuDR-IS905            0            0 bp    0.00 %
   PiggyBac              0            0 bp    0.00 %
   Tourist/Harbinger     0            0 bp    0.00 %
   Other (Mirage,        0            0 bp    0.00 %
    P-element, Transib)

Rolling-circles          0            0 bp    0.00 %

Unclassified:            0            0 bp    0.00 %

Total interspersed repeats:           0 bp    0.00 %


Small RNA:            1043      2107641 bp    0.33 %

Satellites:              0            0 bp    0.00 %
Simple repeats:     540298     28134835 bp    4.41 %
Low complexity:     130176      6603390 bp    1.03 %
==================================================

* most repeats fragmented by insertions or deletions
  have been counted as one element


The query species was assumed to be phormia       
RepeatMasker Combined Database: Dfam_3.1

run with rmblastn version 2.10.0+

here is my script:

#!/bin/bash
#SBATCH --qos pq_mdegenna
#SBATCH --account iacc_mdegenna   
#SBATCH --partition IB_16C_96G
#SBATCH -n 16
#SBATCH -N 1
#SBATCH --output=log


module load RepeatMasker-4.1.0  

RepeatMasker -qq -pa 30 -species Phormia /scratch/mdegenna/slin023/star/asm.contigs.filtered.fasta

Did I do anything wrong? What does this result suggest? Because there should be tons of repeat sequences in eukaryotic organisms, any feedbacks and suggestion are welcomed

genome sequencing • 2.2k views
ADD COMMENT
0
Entering edit mode

It looks like you're using the repeats from the Dfam database. Is your genome from a species that is present in Dfam (or closely related to one)? If not, perhaps the evolutionary distance between your species and the species in Dfam is making it hard to accurately identify your TEs?

When I run RepeatMasker on a new genome, I typically first build a library of repeats from the genome assembly and use that for repeat masking instead of relying on a prebuilt database.

ADD REPLY
0
Entering edit mode
3.8 years ago
Michael 55k

There are no repeat models for Phormia in DFAM. You should install and run RepeatModeler first to generate those yourself, then use the generated repeat families with RepeatMasker, possibly also run RepeatMasker in sensitive mode.

As a side note, avoid installing RepeatModeler via BioConda at the moment. The version I got was a non-functional development version. You can install some or most dependencies via conda, but not RepeatModeler itself.

ADD COMMENT
0
Entering edit mode

Hi, Michael, I have followed up RepeatModeler guide, but I got very confused with "Installation-Configuration part". I am running on the HPC cluster. I try to run "Automatic" on configure script with supplied parameters perl /home/slin023/RepeatModeler-2.0.1/configure -rscout_dir /home/slin023/RepeatScout-1.0.6 -recon_dir /home/slin023/RECON-1.08 -rmblast_dir /home/slin023/rmblast-2.10.0 -trf_prgm /home/slin023/trf409.linux32 -ltr_retriever_dir /home/slin023/LTR_retriever-2.9.0 -genometools_dir /home/slin023/genometools-1.6.1, but perl on cluster missed some files, and I have to ask admin to install it:

The following perl modules required by RepeatModeler are missing from
your system.  Please install these first:
    JSON
    File::Which
    URI
    LWP::UserAgent

I could try editing "RepModelConfig.pm", but I don't know where I am supposed to edit it. I tried using he "configuration overrides" command line options with the RepeatModeler programs: /home/slin023/RepeatModeler-2.0.1/RepeatModeler -rscout_dir /home/slin023/RepeatScout-1.0.6 -recon_dir /home/slin023/RECON-1.08 -rmblast_dir /home/slin023/rmblast-2.10.0 -trf_prgm /home/slin023/trf409.linux32 -ltr_retriever_dir /home/slin023/LTR_retriever-2.9.0 -genometools_dir /home/slin023/genometools-1.6.1 , but I got this errors :

Can't locate WUBlastSearchEngine.pm in @INC (@INC contains: /usr/local/RepeatMasker /home/slin023/RepeatModeler-2.0.1 /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at /home/slin023/RepeatModeler-2.0.1/RepeatUtil.pm line 78.
BEGIN failed--compilation aborted at /home/slin023/RepeatModeler-2.0.1/RepeatUtil.pm line 78.
Compilation failed in require at /home/slin023/RepeatModeler-2.0.1/RepeatModeler line 126.
BEGIN failed--compilation aborted at /home/slin023/RepeatModeler-2.0.1/RepeatModeler line 126.

Here is what I had on my script so far:

#!/bin/bash
#SBATCH --qos pq_mdegenna
#SBATCH --account iacc_mdegenna
#SBATCH --partition IB_16C_96G
#SBATCH -n 16
#SBATCH -N 1
#SBATCH --output=log

module load perl-5.30.3-gcc-8.2.0-cbdkyxc
export PATH=$PATH:/home/slin023/RepeatModeler-2.0.1
export PATH=$PATH:/home/slin023/LTR_retriever-2.9.0
export PATH=$PATH:/home/slin023/genometools-1.6.1
export PATH=$PATH:/home/slin023/RECON-1.08
export PATH=$PATH:/home/slin023/RepeatScout-1.0.6
export PATH=$PATH:/home/slin023/rmblast-2.10.0
export PATH=$PATH:/home/slin023/trf409.linux32

perl /home/slin023/RepeatModeler-2.0.1/configure -rscout_dir /home/slin023/RepeatScout-1.0.6 -recon_dir /home/slin023/RECON-1.08 -rmblast_dir /home/slin023/rmblast-2.10.0 -trf_prgm /home/slin023/trf409.linux32 -ltr_retriever_dir /home/slin023/LTR_retriever-2.9.0 -genometools_dir /home/slin023/genometools-1.6.1 

/home/slin023/RepeatModeler-2.0.1/BuildDatabase -name Phormia /scratch/mdegenna/slin023/DATASETS/ASSEMBLY_NAME/asm.contigs.filtered.fasta

/home/slin023/RepeatModeler-2.0.1/RepeatModeler -database Phormia -pa 16 -LTRStruct >& run.out

any suggestions are welcomed. If you have any script example, that would be really helpful, Thank you

ADD REPLY
0
Entering edit mode

could you install the Perl modules? They are needed for RepeatModeler to function, so there is not point in trying to run it without. Stick to the configure script for the time being.

 Please install these first:
JSON
File::Which
URI
LWP::UserAgent
ADD REPLY
0
Entering edit mode

The WUBlastSearchEngine.pm is a perl module of RepeatMasker. You need to configure RepeatModeler correctly and give it the path where RepeatMasker is installed. Do not use the command-line override options, they will just lead to confusion.

ADD REPLY
0
Entering edit mode

Hi, Michael, this sounds like silly question. I installed every Perl modules thru bioconda environment, and I started to edit path to every package, but I don't get it what I missed on my RepeatScout path enter image description here But my RepeatScout is 1.0.6, and as you see, it's there enter image description here

Maybe because inside the /home/slin023/RepeatScout, the"RepeatScout" running program is missing?

ADD REPLY
0
Entering edit mode

Try to install RepeatScout via conda, I think it is missing from your path.

ADD REPLY
0
Entering edit mode

I have configured all the path to each modules, and it says RepeatModeler is ready to use. When I typed in /home/slin023/RepeatModeler-2.0.1/BuildDatabase -name Phormia /home/slin023/asm.contigs.fasta , this is what it shows: enter image description here

and this is first 5 lines of "asm.contigs.fasta" enter image description here

So is the format wrong on the input file? If you have any suggestion, please let me know

ADD REPLY

Login before adding your answer.

Traffic: 2745 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6