Entering edit mode
5.5 years ago
SeaStar
▴
50
hello! I'm analyzing the genome of a cephalopoda. I have my genome.fa and my custom library. I put this command on repeatmasker:
$:~/RepeatMasker -lib repeatlib.fa -dir output_file mygenome.fa
Is it correct? Or I have to add something like the species? Because the output generate appears to be without elements:
==================================================
file name: mygenome.fa
sequences: 1000
total length: 1052553 bp (1041046 bp excl N/X-runs)
GC level: 34.60 %
bases masked: 697079 bp ( 66.23 %)
==================================================
number of length percentage
elements* occupied of sequence
--------------------------------------------------
SINEs: 0 0 bp 0.00 %
ALUs 0 0 bp 0.00 %
MIRs 0 0 bp 0.00 %
LINEs: 0 0 bp 0.00 %
LINE1 0 0 bp 0.00 %
LINE2 0 0 bp 0.00 %
L3/CR1 0 0 bp 0.00 %
LTR elements: 0 0 bp 0.00 %
ERVL 0 0 bp 0.00 %
ERVL-MaLRs 0 0 bp 0.00 %
ERV_classI 0 0 bp 0.00 %
ERV_classII 0 0 bp 0.00 %
DNA elements: 0 0 bp 0.00 %
hAT-Charlie 0 0 bp 0.00 %
TcMar-Tigger 0 0 bp 0.00 %
Unclassified: 5436 722760 bp 68.67 %
Total interspersed repeats: 722760 bp 68.67 %
Small RNA: 0 0 bp 0.00 %
Satellites: 0 0 bp 0.00 %
Simple repeats: 1511 93735 bp 8.91 %
Low complexity: 0 0 bp 0.00 %
==================================================
* most repeats fragmented by insertions or deletions
have been counted as one element
The query species was assumed to be homo
RepeatMasker Combined Database: Dfam_Consensus-20181026, RepBase-20181026
run with rmblastn version 2.6.0+
The query was compared to unclassified sequences in ".../repeatlib.fa"
thank you!!
I think for elements to show up the repeat library fasta headers needs to have a specific format eg.
>seq1#LTR/ERV1
the masking did happen, cfr this line :
but as microfuge , pointed out the summary table might be incomplete because it's just not able to classify the found repeats correctly. In essence that's not a big issue as the most important thing is that it did mask what needed to be masked
This is correct, the output summary table checks for mostly human repeats - there is a script called
buildSummary.pl
in the util folder of RepeatMasker which builds a better summary based on the .out filesSee this for an output example RepeatMasker:understanding buildSummary.pl output
Ok. So, the elements are not reported in this table, but, probably I'll find them in the mygenome.out.fa, right? The file .out.tbl is not essential for me, I don't need to construct the new summary
don't know by heart but there is certainly an output file (might be the out.tbl ? ) that denotes which elements have been used to mask a certain region, using the fastaIDs from the library you provided
here I report some elements as exampe of my library:
Is not able the program to recognize elements like these?