I'm analyzing mygenome.fa with RepeatScout and RepeatMasker to find transposable elements. I produced a library of Repetitive elements with repeatscout and the I masked TE using Repeatmasker.
./build_lmer_table -l 14 -sequence mygenome.fa -freq mygenome.freq
./RepeatScout -sequence mygenome.fa -output mygenome_repeats.fa -freq mygenome.freq -l 14
cat mygenome_repeats.fa| ./filter-stage-1.prl >mygenome_repeats_filtered1.fasta
./RepeatMasker -s -lib mygenome_repeats_filtered1.fa mygenome.fa
Generate a masked genome using (non-low-complexity, non-tandem) repeats
cat mygenome_repeats_filtered1.fa | ./filter-stage-2.prl --cat mygenome.fa.out >mygenome_repeats_filtered2.fa
Filter out all (non-low-complexity, non-tandem) repeats that have less than 10 repeats
./RepeatMasker -pa 4 -s -lib mygenome_repeats_filtered2.fa -nolow -norna -no_is -gff mygenome.fa
After this I ran Repeatmasker:
./RepeatMasker no_is mygenome_repeats_filtered2.fa
and I produced:
mygenome_repeats_filt2.fa.masked mygenome_repeats_filt2.fa.tbl
mygenome_repeats_filt2.fa.cat mygenome_repeats_filt2.fa.out
I would to know a way to find the location of transposable elements masked by RepeatMasker on my genome.
Supposing that this is a part of my Repetitive elements library produced with RepeatScout as I showed above:
less mygenome_repeats_filt2.fa
>R=3 (RR=4. TRF=0.000 NSEG=0.000)
TAAGGCGGCGAGCTGGCAGAATCGTTAGCACGCCGGGCGAAATGCTTAGCGGTATTTCGTCTGTCTTTACGTTCTGAGTT
CAAATTCCGCCGAGGTCGACTTTGCCTTTCATCCTTTCGGGGTCGATAAAATAAGTACCAGTTGAGCACTGGGGTCGATG
TAATCGACTTACCCCCTCCCCCAAAATTTCTGGCCTTGTGCCTATATTAGAAACGATTATT
>R=4 (RR=5. TRF=0.122 NSEG=0.226)
ACACACACACACACACACACACACACATATATATATATATACATATATACGACGGGCTTCTTTCAGTTTCCGTCTACCAA
ATCCACTCACAAGGCTTTGGTCGGCCCGAGGCTATAGTAGAAGACACTTGCCCAAGGTGCCACGCAGTGGGACTGAACCC
GGAACCATGTGGTTGGTAAGCAAGCTACTTACCACACAGCCACTCCTGCGCCTATATATAT
>R=6 (RR=7. TRF=0.134 NSEG=0.247)
TTGTTTCAGTCATTTGACTGCGGCCATGCTGGAGCACCGCCTTTAGTCGAGCAAATCGACCCCAGGACTTATTCTTTGTA
AGCCTAGTACTTATTCTATCGGTCTCTTTTGCCGAACCGCTAAGTTACGGGGACGTAAACACACCAGCATCGGTTGTCAA
GCGATGTTGGGGGGACAAACACAGACACACAAACACACACACACACATACATATATATATATATATATATA
from the file mygenome_repeats_filt2.fa.out I can see that in R=3 there is a trasposable element:
SW perc perc perc query position in query matching repeat position in repeat
score div. del. ins. sequence begin end (left) repeat class/family begin end (left) ID
348 20.6 0.0 0.0 R=3 78 140 (81) + AmnSINE2 SINE/tRNA-Deu 67 129 (229) 234
As you can see I have the coordinates to find this element in the library, but I would to find its exactly in the file that contain my assembled genome.
frida : Have you checked
repeatmasker
help to see if there is a way to change the repeat nucleotides to lower case (or N's in case of hard masking) when they are found?No i didn't. I can check it
Is the "position in query begin/end" not telling what you are looking for or am I missing something?
as genomax mentioned , using the hard/soft masking approach (and then analysing the masked genome) will anyway give the exact locations of repeats in the genome. You will have to specifically activate it, since RepeatMasker will not do it by default
I have to find this option in help. Anyway, I'm interested in find the position in mygenome.fa that is divided in scaffold, not in query that is R=...
we are taking about this RepeatMasker, correct?
Hi , i think you have the sequences repeat in output. So you just have to align yours repeat sequences against your genome with Blast for exemple.
Best
I modified my post with some other information for better explain the problem
So you can BLAST mygenome_repeats_filt2.fa against your_genome.fa.