bases masked: 73100024 bp ( 34.90 %)

Question

repeatmasker empty output

0

Entering edit mode

5.6 years ago

Chironex ▴ 50

hi everybody, finally I run Repeatmasker but after all, it generates a table like this:

file name: OB_100DEC.fa
sequences: 100 total length: 209466439 bp (184235452 bp excl N/X-runs) GC level: 35.24 %

bases masked: 73100024 bp ( 34.90 %)

           number of      length   percentage

elements* occupied of sequence

SINEs: 0 0 bp 0.00 % ALUs 0 0 bp 0.00 % MIRs 0 0 bp 0.00 %

LINEs: 0 0 bp 0.00 % LINE1 0 0 bp 0.00 % LINE2 0 0 bp 0.00 % L3/CR1 0 0 bp 0.00 %

LTR elements: 0 0 bp 0.00 % ERVL 0 0 bp 0.00 % ERVL-MaLRs 0 0 bp 0.00 % ERV_classI 0 0 bp 0.00 % ERV_classII 0 0 bp 0.00 %

DNA elements: 0 0 bp 0.00 % hAT-Charlie 0 0 bp 0.00 % TcMar-Tigger 0 0 bp 0.00 %

Unclassified: 560941 66374119 bp 31.69 %

Total interspersed repeats: 66374119 bp 31.69 %

Small RNA: 0 0 bp 0.00 %

Satellites: 0 0 bp 0.00 % Simple repeats: 290711 15358198 bp 7.33 %

Low complexity: 0 0 bp 0.00 %

most repeats fragmented by insertions or deletions have been counted as one element

The query species was assumed to be homo
RepeatMasker Combined Database: Dfam_Consensus-20181026

run with rmblastn version 2.6.0+ The query was compared to unclassified sequences in ".../OB_100DEC_repeats_filtered1.fa"

I used repeatsout to generate the library, and this was my command:

./RepeatMasker -s -lib /home/RepeatScout-1.0.5/OB_100DEC_repeats_filtered1.fa /home/Workdirectory/OB_100DEC.fa

can anyone explain why there are not TE? The fasta genome file is about 200 mb and it is composed by the 100 greatest contigs of my genome. thank you

genome • 2.1k views

ADD COMMENT • link 5.6 years ago by Chironex ▴ 50

0

Entering edit mode

I think the TE type info is taken from fasta header of the repeat library

>SINEC2A2_CF#SINE/tRNA RepbaseID: SINEC2A2_CFXX

Can you check if fasta header of your repeat library look like this ?

ADD REPLY • link 5.6 years ago by microfuge ★ 1.9k

0

Entering edit mode

nope. I generated the library by myself using repeatscout and the header of fasta file is:

R=3 (RR=4. TRF=0.000 NSEG=0.000) TAAGGCGGCGAGCTGGCAGAATCGTTAGCACGCCGGGCGAAATGCTTAGCGGTATTTCGTCTGTCTTTACGTTCTGAGTT CAAATTCCGCCGAGGTCGACTTTGCCTTTCATCCTTTCGGGGTCGATAAAATAAGTACCAGTTGAGCACTGGGGTCGATG TAATCGACTTACCCCCTCCCCCAAAATTTCTGGCCTTGTGCCTATATTAGAAACGATTATT R=4 (RR=5. TRF=0.122 NSEG=0.226) ACACACACACACACACACACACACACATATATATATATATACATATATACGACGGGCTTCTTTCAGTTTCCGTCTACCAA ATCCACTCACAAGGCTTTGGTCGGCCCGAGGCTATAGTAGAAGACACTTGCCCAAGGTGCCACGCAGTGGGACTGAACCC GGAACCATGTGGTTGGTAAGCAAGCTACTTACCACACAGCCACTCCTGCGCCTATATATAT R=6 (RR=7. TRF=0.134 NSEG=0.247) TTGTTTCAGTCATTTGACTGCGGCCATGCTGGAGCACCGCCTTTAGTCGAGCAAATCGACCCCAGGACTTATTCTTTGTA AGCCTAGTACTTATTCTATCGGTCTCTTTTGCCGAACCGCTAAGTTACGGGGACGTAAACACACCAGCATCGGTTGTCAA GCGATGTTGGGGGGACAAACACAGACACACAAACACACACACACACATACATATATATATATATATATATA and so on..

ADD REPLY • link 5.6 years ago by Chironex ▴ 50

0

Entering edit mode

Thanks and sorry that I don't have a direct answer.

What we do is use RepeatModeler http://www.repeatmasker.org/RepeatModeler/ . Then use the queryRepeatDatabase.pl which comes with RepeatMasker to dump the repeats in the species we use (eg -species Carnivora . The combine the repeatlibs generated from both (RepeatModeler and queryRepeatDatabase.pl) to generate a repeat lib file for masking.

Also just in case if you installed repeatmasker yourself, the repeat libs need to be downloaded from repbase (https://www.girinst.org/repbase/) which needs a registration and repeatmasker configured to use them.

I forgot this step and it resulted in very low masking which resolved after downloading the repeatlibs from repbase.

ADD REPLY • link 5.6 years ago by microfuge ★ 1.9k

0

Entering edit mode

hi! I downloaded RebBase from girinst months ago. Now I can't redownload it even if I'm registered because it requires a submission of my institute and I don't know how can obtain it. Anyway, as I said, I have this RepBase library installed with the program. If you read my message above, appears this message "The query species was assumed to be homo RepeatMasker Combined Database: Dfam_Consensus-20181026". I don't undestand what should be the command line to run the program. Could you help me showing an example?

ADD REPLY • link 5.6 years ago by Chironex ▴ 50

0

Entering edit mode

ok. You can check if your installed repeat libraries contain different types of TEs with the script that comes with repeatmasker.

queryRepeatDatabase.pl -species human -stat

This should dump different TE types (LINE, SINE ... ) . If the output does not contain these TE types then the output you are getting would make sense.

ADD REPLY • link 5.6 years ago by microfuge ★ 1.9k

0

Entering edit mode

Ok. In this repbase library I've found two .embl files with different types of TE. There are not fasta formats. Are they usable? The species in my analysis is an invertebrate (cephalopod) so what species shohld I use? Thank you again

ADD REPLY • link 5.6 years ago by Chironex ▴ 50