Question

Using RepeatModeller and RepeatMasker for Multiple Gastropod Genomes: Is a Single Repeat Library Sufficient?

0

Entering edit mode

8 months ago

Rohan ▴ 40

Hello everyone,

I'm currently working on the genomes of 30 taxa within Gastropoda, and I have the complete genome sequences for each species. My aim is to functionally annotate each genome. The pipeline I'm using involves:

Running RepeatModeller to build a custom repeat library.
Using RepeatMasker to mask repetitive elements across each genome.
Proceeding with BRAKER3 for gene prediction after masking.

I have a question about optimizing this process:

Do I need to generate a RepeatModeller library for each species individually? Since all taxa belong to Gastropoda, would creating a custom repeat library for each species give significant benefits over building a library from one or a subset of these genomes? My concern is about computational time and redundancy in repeats that might be highly similar across these taxa.

Any insights or suggestions on whether I should stick to one model or customize for each species would be greatly appreciated. Thank you!

genome-annotation RepeatModeller RepeatMasker • 711 views

ADD COMMENT • link updated 8 months ago by Mensur Dlakic ★ 29k • written 8 months ago by Rohan ▴ 40

score 2 · Accepted Answer · 2024-10-29

Since all taxa belong to Gastropoda, would creating a custom repeat library for each species give significant benefits over building a library from one or a subset of these genomes?

It depends on whether you are interested in doing this fast or doing it well. I would do them individually unless they were near-identical strains of the same species.

My concern is about computational time and redundancy in repeats that might be highly similar across these taxa.

You have information that you didn't share with us: How similar are these groups? I would imagine that genomic sequences with > 80-90% sequence identity are likely to have near-identical repeats. Still, there may be repeats that are unique only to some groups, so I would still do what I proposed above.