Pipeline transposable elements
3
1
Entering edit mode
5.5 years ago
SeaStar ▴ 50

Hello! I'm searching for suggestion about the new best pipeline/s to find TRANSPOSABLE ELEMENTS in a new assembled genome of a no reference species (octopus). I don't want to use repeatmasker. Can anyone help me?

transposon • 3.3k views
ADD COMMENT
1
Entering edit mode

There is nothing wrong with RepeatMasker but it is indeed not a tool to find "new" repeats. You should use if for what's intended for namely masking of a genome using a library of TEs

ADD REPLY
0
Entering edit mode

The pipeline of RepeatModeler + RepeatMasker is the most common approach to identify new repeats genome wide. A recent paper describes RepeatModeler 2 which also has a new classification function for LTR elements. Internally it uses RECON and RepeatScout. Why not give it a try?

ADD REPLY
0
Entering edit mode

indeed, the only caution here is that repeatmodeler is known to be quite aggresive in TE detection and might thus report false positive repeats (typical example: well conserved protein domains might sneak in there as well). So a backscreen of the results of repeatmodeler to a known true protein DBs wil certainly help here.

ADD REPLY
0
Entering edit mode

Good idea, so I am going to blastX my repeat library against NR or SwissProt, never thought about it. I am just wondering about a few things: RepeatMasker/Modeler seems to be a de-facto standard for genome annotations, is that justified? Also, repeat detection still seems to under predict repeat sequence content with some papers claiming that repeat content may be 70% in the human genome.

ADD REPLY
0
Entering edit mode

goh, yes that is justified I think (though there are also alternatives around) as long as you are well aware of what you are doing and what the tool does (/doesn't). And being a bit more cautious will never hurt.

indeed. TE identification is not a straightforward field I feel, also biology does not help as often those TEs are so degenerated that similarity approaches do not pick them up anymore. Much more for improvement here thus.

ADD REPLY
3
Entering edit mode
3.7 years ago
Juke34 8.9k

Extensive de-novo TE Annotator EDTA

enter image description here

ADD COMMENT
0
Entering edit mode

Another vote for EDTA!

ADD REPLY
2
Entering edit mode
5.5 years ago
flogin ▴ 280

If you had the NGS datasets of its genome you can use the DNApipeTE (https://github.com/clemgoub/dnaPipeTE) or for your assembled genome you can use the TEdenovo and TEannot pipelines from REPET package (https://urgi.versailles.inra.fr/Tools/REPET).

ADD COMMENT
2
Entering edit mode
5.5 years ago
Beuss ▴ 140

RepeatMasker is a viable solution if you already have a database of TEs from a very close specie. With RepeatMasker and Repbase, you won't be able to detect all mobile elements specific to your newly assembled genome. Moreover, your annotation will be polluted by false positive.

REPET perform de novo detection by only using the genome sequence. It's quite heavy computation, but if your genome is too big or too repeated, you could apply an iterative strategy to reduce complexity. This way you will find the specific population of TE for your genome and do an exhaustive annotation of all the copies even for the very oldest ones.

https://mobilednajournal.biomedcentral.com/articles/10.1186/s13100-019-0150-y

https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7562280

https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0016526&type=printable

ADD COMMENT
1
Entering edit mode

A pain to install, still waiting for a proper container.

ADD REPLY

Login before adding your answer.

Traffic: 2192 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6