Hello everyone,
I'm currently working on the genomes of 30 taxa within Gastropoda, and I have the complete genome sequences for each species. My aim is to functionally annotate each genome. The pipeline I'm using involves:
- Running RepeatModeller to build a custom repeat library.
- Using RepeatMasker to mask repetitive elements across each genome.
- Proceeding with BRAKER3 for gene prediction after masking.
I have a question about optimizing this process:
Do I need to generate a RepeatModeller library for each species individually? Since all taxa belong to Gastropoda, would creating a custom repeat library for each species give significant benefits over building a library from one or a subset of these genomes? My concern is about computational time and redundancy in repeats that might be highly similar across these taxa.
Any insights or suggestions on whether I should stick to one model or customize for each species would be greatly appreciated. Thank you!
Thank you for the insights! These taxa are indeed entirely different species within the same family, so it's very likely that their genomes share less than 80-90% identity. Given this, I’ll proceed with creating custom repeat libraries for each species.