Question

Build consensus sequences from repeat masker output

0

Entering edit mode

4.4 years ago

Amaranta Remedios ▴ 20

Hi,

So I have a repeat masker output file for a new organism (crustacean). And I want to use the transposable elements in this specie to analyse the piRNAs (using my own sequencing data=short reads).

The problem is I would like to get consensus sequences for transposable elements in this specie, instead of having each position in the genome where there is a transposon. Because if the same transposon exist in 100 copies in the genome I will have it 100 times in Repeatmasker.
Ideally I will like to get to a multifasta file like the ones in Repbase but I am a bit lost about how to use the Repeatmasker output to achieve this.

Any suggestion will be very helpful ! Thanks

repeatmasker Transposable elements • 1.4k views

ADD COMMENT • link updated 4.2 years ago by bioinfo • 0 • written 4.4 years ago by Amaranta Remedios ▴ 20

0

Entering edit mode

I think the easiest way would be to manipulate the coordinates as a bed file and then use bedtools to extract the sequences from the fasta. Once you have the fastas you can get a consensus

ADD REPLY • link 4.4 years ago by Asaf 10k

0

Entering edit mode

Thanks for the comment. I have already extracted the fasta sequences. I guess the way to move forward would be to do some sort of clustering on the sequences but I am not just sure about that.

ADD REPLY • link 4.4 years ago by Amaranta Remedios ▴ 20

0

Entering edit mode

You should have the name of the repeat, you can start with that and then get a consensus for each group.

ADD REPLY • link 4.4 years ago by Asaf 10k

score 0 · Answer 1 · 2020-09-21

there is a script shipped with repeatMasker directory will solve your struggle I assume can be found here

repeatMasker/util/queryRepeatDatabase.pl

What you can do is to get from the database all repetitions of the corresponding taxa you are interested in.

apply as below:

util/queryRepeatDatabase.pl -species YourSpecies  > YourSpecies_repetitions.lib