repeatmasker with my library
0
1
Entering edit mode
5.5 years ago
SeaStar ▴ 50

hello! I'm analyzing the genome of a cephalopoda. I have my genome.fa and my custom library. I put this command on repeatmasker:

$:~/RepeatMasker -lib repeatlib.fa -dir output_file mygenome.fa

Is it correct? Or I have to add something like the species? Because the output generate appears to be without elements:

==================================================
file name: mygenome.fa       
sequences:          1000
total length:    1052553 bp  (1041046 bp excl N/X-runs)
GC level:         34.60 %
bases masked:     697079 bp ( 66.23 %)
==================================================
               number of      length   percentage
               elements*    occupied  of sequence
--------------------------------------------------
SINEs:                0            0 bp    0.00 %
      ALUs            0            0 bp    0.00 %
      MIRs            0            0 bp    0.00 %

LINEs:                0            0 bp    0.00 %
      LINE1           0            0 bp    0.00 %
      LINE2           0            0 bp    0.00 %
      L3/CR1          0            0 bp    0.00 %

LTR elements:         0            0 bp    0.00 %
      ERVL            0            0 bp    0.00 %
      ERVL-MaLRs      0            0 bp    0.00 %
      ERV_classI      0            0 bp    0.00 %
      ERV_classII     0            0 bp    0.00 %

DNA elements:         0            0 bp    0.00 %
     hAT-Charlie      0            0 bp    0.00 %
     TcMar-Tigger     0            0 bp    0.00 %

Unclassified:      5436       722760 bp   68.67 %

Total interspersed repeats:   722760 bp   68.67 %


Small RNA:            0            0 bp    0.00 %

Satellites:           0            0 bp    0.00 %
Simple repeats:    1511        93735 bp    8.91 %
Low complexity:       0            0 bp    0.00 %
==================================================

* most repeats fragmented by insertions or deletions
  have been counted as one element


The query species was assumed to be homo          
RepeatMasker Combined Database: Dfam_Consensus-20181026, RepBase-20181026

run with rmblastn version 2.6.0+
The query was compared to unclassified sequences in ".../repeatlib.fa"

thank you!!

genome • 3.1k views
ADD COMMENT
0
Entering edit mode

I think for elements to show up the repeat library fasta headers needs to have a specific format eg.

>seq1#LTR/ERV1

ADD REPLY
0
Entering edit mode

the masking did happen, cfr this line :

bases masked:     697079 bp ( 66.23 %)

but as microfuge , pointed out the summary table might be incomplete because it's just not able to classify the found repeats correctly. In essence that's not a big issue as the most important thing is that it did mask what needed to be masked

ADD REPLY
2
Entering edit mode

This is correct, the output summary table checks for mostly human repeats - there is a script called buildSummary.pl in the util folder of RepeatMasker which builds a better summary based on the .out files

See this for an output example RepeatMasker:understanding buildSummary.pl output

ADD REPLY
0
Entering edit mode

Ok. So, the elements are not reported in this table, but, probably I'll find them in the mygenome.out.fa, right? The file .out.tbl is not essential for me, I don't need to construct the new summary

ADD REPLY
1
Entering edit mode

don't know by heart but there is certainly an output file (might be the out.tbl ? ) that denotes which elements have been used to mask a certain region, using the fastaIDs from the library you provided

ADD REPLY
0
Entering edit mode

here I report some elements as exampe of my library:

>Gypsy-5-I_BF1 RB:3e-08 89% 86
GGTCAATAGGAGGTTGGATCTTAGTTGGCAGGGTGGTTTTATATTTCCTGCCATTCAGCATTTCTGCTGGGGATTTCATGTCAGCT
>Penelope-9_HM_Penelope_Hydra1 RB:2e-08 88% 267
AAGTTTCGTAAATCGCCATACAAGAACCAACATTTGAAATATCTTAATACTGTTACCAAACAAGTGAAAAGTGATAAAGGAATTTTCGTTAAATCTGACAAGACTAGAAATATTTATAAACTGAATAAGGAGCATTACATGAATTTACTTAGGAAGGAGATTGAAAAAAATTATAAAATTACAAATGGATGGACGCTCAGAAAGACCAATTTGGATGTTAAGAAACTAATGGAGAAATATAATATTGCGGACAGAACTGAACCTATA

Is not able the program to recognize elements like these?

ADD REPLY

Login before adding your answer.

Traffic: 1853 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6