Entering edit mode
6 months ago
Vijith
▴
90
I've successfully completed the identification and masking of TE elements in the assembled genome output. I used RepeatMasker for this purpose. This process looked as follows, on the terminal window:
identifying matches to TE_monocot.fasta sequences in batch 32746 of 32746
identifying Simple Repeats in batch 32746 of 32746
The final output generated files of which the file named file.fasta.masked
is of the same size as the original input fasta file, another file named file.fasta.out
is of ~700mb, and a third file named file.fasta.tbl
.
I understand that file.fasta.masked
is the final repeat-masked version. The following are my queries:
- What does the simple repeats in batch mean? Does 32746 mean the number of identified TE repeats?
- What are the subsequent downstream processes that I can go for; as of now, I assume gene prediction using Augustus is a better move. Any other suggestions?