Successfully identified and Masked rpeats using RM. What Next?
0
0
Entering edit mode
6 months ago
Vijith ▴ 90

I've successfully completed the identification and masking of TE elements in the assembled genome output. I used RepeatMasker for this purpose. This process looked as follows, on the terminal window:

identifying matches to TE_monocot.fasta sequences in batch 32746 of 32746
identifying Simple Repeats in batch 32746 of 32746

The final output generated files of which the file named file.fasta.masked is of the same size as the original input fasta file, another file named file.fasta.out is of ~700mb, and a third file named file.fasta.tbl . I understand that file.fasta.masked is the final repeat-masked version. The following are my queries:

  1. What does the simple repeats in batch mean? Does 32746 mean the number of identified TE repeats?
  2. What are the subsequent downstream processes that I can go for; as of now, I assume gene prediction using Augustus is a better move. Any other suggestions?
sequence annotation repeatmasker illumina assembly • 208 views
ADD COMMENT

Login before adding your answer.

Traffic: 1847 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6