Question

RepeatModeler contig size filter and parallelization

0

Entering edit mode

5.7 years ago

Anand Rao ▴ 640

The RepeatModeler version 1.0.11 latest release says this in it's README file:

o Genomes with numerous short contigs ( Diatom for example ) will take longer to BLAST than larger genomes with larger contigs. This is an optimization problem left for future releases.

o This program is not parallelized. It can only run on one node. This is something we are considering for future releases.

It is not clear whether this README has been updated from earlier versions...because it contains Benchmarks and statistics for rather old versions 1.0.0, 1.0.2 and 1.0.3.

Here are some questions:

1. An earlier thread here on Biostars - RepeatModeler cannot be run in parallel, discusses how one bottleneck could be eleredef. Has there been a work-around to this since? Does any user know of one? If yes, could you please share?

2. And about fragmented genomes, is there a rule of thumb regarding contig size that be should used at cutoff for filtering input multifasta?

I do not find Q & A regarding these topics at the RepeatMasker / RepeatModeler Github page. Therefore I seek help from the Biostars community. Perhaps past responders to RepeatModeler questions can share their thoughts as well, please? @microfuge, @Michael Dondrup, @Mehmet :) Thanks!

RepeatModeler Contig length Parallel • 1.2k views

ADD COMMENT • link 5.7 years ago by Anand Rao ▴ 640

score 0 · Answer 1 · 2019-03-22

0

Entering edit mode

5.7 years ago

Anand Rao ▴ 640

Answering question #2 : This blog post suggests filtering out sequences < 5KB in length would be appropriate... Any additional thoughts from a forum member?

Question #1 still remains unanswered. Anyone?

ADD COMMENT • link 5.7 years ago by Anand Rao ▴ 640