Entering edit mode
2.5 years ago
jaqx008
▴
110
Hello,
I am trying to obtain all TEs from a genome using RepeatMasker. However, the result only returned simple repeats and low complexity TEs (1.71% total). All the other TE classes returned zero values. Am I doing something wrong? How can I recover all the other TE classes? See my command below:
RepeatMasker -pa 6 -species Branchiostoma belcheri genomic.fna > Masked_Bbe
Thanks
This will most likely be due to a library issue, where the TE library repeatmasker is searching with is basically non-existant. I had the same issue which I think was due to my conda installation...are you perhaps in the same boat?
I don't know how commonly studied your species is but your alternative is generating a denovo repeat library using RepeatModeler (as perhaps the generic repeatmasker repeat libraries would poorly represent your species).
That makes sense. I did try Repeatmasker but the sequences returned are not grouped into TE classes or types. just fasta sequences representing the masked TEs. Is this typical with RepeatModeler?
I am not sure I understand. You are saying you tried de novo TE identification with RepeatModeler?
Yes I did. But the output was a bunch of repeat sequences in fasta format. The headers didnt say what type of TE class was discovered. This is the command I used after I created a db from the genome.
Do you have a better command I can try?
So all the repeats identified by RepeatModeler are labelled "unknown"? Otherwise the fasta header for each sequence would have an indication of the repeat type.
Unfortunately, that is exactly how I would run RepeatModeler also and the fasta file should be called 'Bbe-families'. After running something like
BuildDatabase -name Bbe -engine ncbi Bbe.assembly.fa
of course