Entering edit mode
2.0 years ago
ProtExcluder.pl is a classic pipeline used to exclude homologs of annotated proteins from a library of interspersed repeat elements, prior to repeat annotation. The goal is to avoid overmasking CDS in the target genome (misinterpreting genes as parts of interspersed repeats).
Are there any other tools for the same purpose?
Unfortunately I have had issues with deploying this pipeline for days (https://github.com/NBISweden/ProtExcluder/issues), and it eliminates 2/3 of my repeat library.
What species are you working on?
It is a non-model rodent (https://www.ncbi.nlm.nih.gov/assembly/GCA_026167925.1).
Everything else that you described in your post works just fine (using your conda env), but then it fails at the ProtExcluder stage. I tried just going script by script, something weird happens with esl-fetch (which I have installed system-wide).
From what you say it does not fail, you just have too much removed compared to what you would expect. Maybe some particularity of your studied species
Doubt it. One, it is a rodent, nothing too peculiar about it. Two, as I said, there is a very specific problem with esl-fetch (its output file contains a bunch of error messages, rather than the kind of output one would expect). Anyway, to answer my own Q, I found a viable alternative: https://blaxter-lab-documentation.readthedocs.io/en/latest/filter-repeatmodeler-library.html