Successfully identified and Masked rpeats using RM. What Next?

0

Entering edit mode

6 months ago

Vijith ▴ 90

I've successfully completed the identification and masking of TE elements in the assembled genome output. I used RepeatMasker for this purpose. This process looked as follows, on the terminal window:

identifying matches to TE_monocot.fasta sequences in batch 32746 of 32746
identifying Simple Repeats in batch 32746 of 32746

The final output generated files of which the file named file.fasta.masked is of the same size as the original input fasta file, another file named file.fasta.out is of ~700mb, and a third file named file.fasta.tbl . I understand that file.fasta.masked is the final repeat-masked version. The following are my queries:

What does the simple repeats in batch mean? Does 32746 mean the number of identified TE repeats?
What are the subsequent downstream processes that I can go for; as of now, I assume gene prediction using Augustus is a better move. Any other suggestions?

sequence annotation repeatmasker illumina assembly • 208 views

ADD COMMENT • link 6 months ago by Vijith ▴ 90

Login before adding your answer.