Question

repeatmasker with my library

1

Entering edit mode

6.1 years ago

SeaStar ▴ 50

hello! I'm analyzing the genome of a cephalopoda. I have my genome.fa and my custom library. I put this command on repeatmasker:

$:~/RepeatMasker -lib repeatlib.fa -dir output_file mygenome.fa

Is it correct? Or I have to add something like the species? Because the output generate appears to be without elements:

==================================================
file name: mygenome.fa       
sequences:          1000
total length:    1052553 bp  (1041046 bp excl N/X-runs)
GC level:         34.60 %
bases masked:     697079 bp ( 66.23 %)
==================================================
               number of      length   percentage
               elements*    occupied  of sequence
--------------------------------------------------
SINEs:                0            0 bp    0.00 %
      ALUs            0            0 bp    0.00 %
      MIRs            0            0 bp    0.00 %

LINEs:                0            0 bp    0.00 %
      LINE1           0            0 bp    0.00 %
      LINE2           0            0 bp    0.00 %
      L3/CR1          0            0 bp    0.00 %

LTR elements:         0            0 bp    0.00 %
      ERVL            0            0 bp    0.00 %
      ERVL-MaLRs      0            0 bp    0.00 %
      ERV_classI      0            0 bp    0.00 %
      ERV_classII     0            0 bp    0.00 %

DNA elements:         0            0 bp    0.00 %
     hAT-Charlie      0            0 bp    0.00 %
     TcMar-Tigger     0            0 bp    0.00 %

Unclassified:      5436       722760 bp   68.67 %

Total interspersed repeats:   722760 bp   68.67 %


Small RNA:            0            0 bp    0.00 %

Satellites:           0            0 bp    0.00 %
Simple repeats:    1511        93735 bp    8.91 %
Low complexity:       0            0 bp    0.00 %
==================================================

* most repeats fragmented by insertions or deletions
  have been counted as one element


The query species was assumed to be homo          
RepeatMasker Combined Database: Dfam_Consensus-20181026, RepBase-20181026

run with rmblastn version 2.6.0+
The query was compared to unclassified sequences in ".../repeatlib.fa"

thank you!!

genome • 3.5k views

ADD COMMENT • link updated 6.0 years ago by Biostar 20 • written 6.1 years ago by SeaStar ▴ 50

0

Entering edit mode

I think for elements to show up the repeat library fasta headers needs to have a specific format eg.

>seq1#LTR/ERV1

ADD REPLY • link 6.1 years ago by microfuge ★ 2.0k

0

Entering edit mode

the masking did happen, cfr this line :

bases masked:     697079 bp ( 66.23 %)

but as microfuge , pointed out the summary table might be incomplete because it's just not able to classify the found repeats correctly. In essence that's not a big issue as the most important thing is that it did mask what needed to be masked

ADD REPLY • link 6.1 years ago by lieven.sterck 15k

2

Entering edit mode

This is correct, the output summary table checks for mostly human repeats - there is a script called buildSummary.pl in the util folder of RepeatMasker which builds a better summary based on the .out files

See this for an output example RepeatMasker:understanding buildSummary.pl output

ADD REPLY • link 6.1 years ago by Philipp Bayer 8.8k

0

Entering edit mode

Ok. So, the elements are not reported in this table, but, probably I'll find them in the mygenome.out.fa, right? The file .out.tbl is not essential for me, I don't need to construct the new summary

ADD REPLY • link 6.1 years ago by SeaStar ▴ 50

1

Entering edit mode

don't know by heart but there is certainly an output file (might be the out.tbl ? ) that denotes which elements have been used to mask a certain region, using the fastaIDs from the library you provided

ADD REPLY • link 6.1 years ago by lieven.sterck 15k

0

Entering edit mode

here I report some elements as exampe of my library:

>Gypsy-5-I_BF1 RB:3e-08 89% 86
GGTCAATAGGAGGTTGGATCTTAGTTGGCAGGGTGGTTTTATATTTCCTGCCATTCAGCATTTCTGCTGGGGATTTCATGTCAGCT
>Penelope-9_HM_Penelope_Hydra1 RB:2e-08 88% 267
AAGTTTCGTAAATCGCCATACAAGAACCAACATTTGAAATATCTTAATACTGTTACCAAACAAGTGAAAAGTGATAAAGGAATTTTCGTTAAATCTGACAAGACTAGAAATATTTATAAACTGAATAAGGAGCATTACATGAATTTACTTAGGAAGGAGATTGAAAAAAATTATAAAATTACAAATGGATGGACGCTCAGAAAGACCAATTTGGATGTTAAGAAACTAATGGAGAAATATAATATTGCGGACAGAACTGAACCTATA

Is not able the program to recognize elements like these?

ADD REPLY • link 6.1 years ago by SeaStar ▴ 50