hi everybody, finally I run Repeatmasker but after all, it generates a table like this:
file name: OB_100DEC.fa
sequences: 100
total length: 209466439 bp (184235452 bp excl N/X-runs)
GC level: 35.24 %
bases masked: 73100024 bp ( 34.90 %)
number of length percentage
elements* occupied of sequence
SINEs: 0 0 bp 0.00 % ALUs 0 0 bp 0.00 % MIRs 0 0 bp 0.00 %
LINEs: 0 0 bp 0.00 % LINE1 0 0 bp 0.00 % LINE2 0 0 bp 0.00 % L3/CR1 0 0 bp 0.00 %
LTR elements: 0 0 bp 0.00 % ERVL 0 0 bp 0.00 % ERVL-MaLRs 0 0 bp 0.00 % ERV_classI 0 0 bp 0.00 % ERV_classII 0 0 bp 0.00 %
DNA elements: 0 0 bp 0.00 % hAT-Charlie 0 0 bp 0.00 % TcMar-Tigger 0 0 bp 0.00 %
Unclassified: 560941 66374119 bp 31.69 %
Total interspersed repeats: 66374119 bp 31.69 %
Small RNA: 0 0 bp 0.00 %
Satellites: 0 0 bp 0.00 % Simple repeats: 290711 15358198 bp 7.33 %
Low complexity: 0 0 bp 0.00 %
- most repeats fragmented by insertions or deletions have been counted as one element
The query species was assumed to be homo
RepeatMasker Combined Database: Dfam_Consensus-20181026
run with rmblastn version 2.6.0+ The query was compared to unclassified sequences in ".../OB_100DEC_repeats_filtered1.fa"
<h6>#</h6>I used repeatsout to generate the library, and this was my command:
./RepeatMasker -s -lib /home/RepeatScout-1.0.5/OB_100DEC_repeats_filtered1.fa /home/Workdirectory/OB_100DEC.fa
can anyone explain why there are not TE? The fasta genome file is about 200 mb and it is composed by the 100 greatest contigs of my genome. thank you
I think the TE type info is taken from fasta header of the repeat library
>SINEC2A2_CF#SINE/tRNA RepbaseID: SINEC2A2_CFXX
Can you check if fasta header of your repeat library look like this ?
nope. I generated the library by myself using repeatscout and the header of fasta file is:
Thanks and sorry that I don't have a direct answer.
What we do is use RepeatModeler http://www.repeatmasker.org/RepeatModeler/ . Then use the
queryRepeatDatabase.pl
which comes with RepeatMasker to dump the repeats in the species we use (eg-species Carnivora
. The combine the repeatlibs generated from both (RepeatModeler
andqueryRepeatDatabase.pl
) to generate a repeat lib file for masking.Also just in case if you installed repeatmasker yourself, the repeat libs need to be downloaded from repbase (https://www.girinst.org/repbase/) which needs a registration and repeatmasker configured to use them.
I forgot this step and it resulted in very low masking which resolved after downloading the repeatlibs from repbase.
hi! I downloaded RebBase from girinst months ago. Now I can't redownload it even if I'm registered because it requires a submission of my institute and I don't know how can obtain it. Anyway, as I said, I have this RepBase library installed with the program. If you read my message above, appears this message "The query species was assumed to be homo RepeatMasker Combined Database: Dfam_Consensus-20181026". I don't undestand what should be the command line to run the program. Could you help me showing an example?
ok. You can check if your installed repeat libraries contain different types of TEs with the script that comes with repeatmasker.
queryRepeatDatabase.pl -species human -stat
This should dump different TE types (LINE, SINE ... ) . If the output does not contain these TE types then the output you are getting would make sense.
Ok. In this repbase library I've found two .embl files with different types of TE. There are not fasta formats. Are they usable? The species in my analysis is an invertebrate (cephalopod) so what species shohld I use? Thank you again