I'm trying to detect and identify transposable elements in my plant genomes, and I'm having some trouble finding the best programs and pipelines to use. The major reviews of these programs all seem to have come out 5-8 years ago, and I wasn't able to find anything that covered newer programs since then. Does anyone have any experience with finding TEs and would be able to set me in the right direction?
The current plan was to use Tedna and RepeatModeler to detect TEs from our raw read files, but we have de novo assembled genomes that I would like to investigate as well. I would like to run a few different programs and get a good consensus so that I can eliminate false positives that individual programs might put out.
I can definitely help point you in the right direction but it would also help to know some background on what you are trying to accomplish. In general, TE-finding programs are based on some combination of 1) mathematical repeat patterns (k-mer frequency), 2), similarity to some reference database, 3) clustering based on a self-comparison of the data set, or 4) structural features (LTRs, TIRs, etc.). I would say those approaches are in order of complexity to perform, and also in order of how biologically relevant they are.
Transposome was designed for characterizing TE abundance/diversity from raw reads, and it performs very well in terms of accuracy on plant genomes (an example with maize is presented in the paper). I'm the author so I could answer any questions related to the usage. Transposome is based on a clustering approach with the annotations being assigned from a repeat database.
For identifying TEs from an assembled genome you need to think about what type of TE you are interested in. There are many different programs I use for this task with each program being designed for one specific type of TE (based on the structural features). Programs like Recon and RepeatModeler are based on k-mer frequencies, and the goal of RepeatModeler is to try and construct a TE from k-mers. The result is going to be a contig representing the most frequently occurring parts of the element in the genome. Usually this will be the internal coding region because this is more conserved than the flanking repeats. What you get is not a real transposon with single locus, rather it is just a representative of what repeats are found in the genome. This approach can still be useful if you know exactly what you are trying to find out (e.g., quick survey or quick comparison of species). If you want a high quality reference set of TEs for your genome, then I would strongly warn against this approach because the output is not composed of real transposons (and therefore not particularly useful for evolutionary analyses).
ADD COMMENT
• link
updated 2.2 years ago by
Ram
44k
•
written 9.2 years ago by
SES
8.6k
0
Entering edit mode
Thanks for the answer, I wasn't aware that that RepeatModeler worked this way. I want to compare TEs between sister species and sister clades, so I guess that RepeatModeler might not be the best choice here. Do you know anything about REPCLASS?
I will look at Transposome. I really appreciate the help, thank you!
I have used repclass but didn't find it very useful. After all the set up it classified things to a very course level but I have not tried it in ~5 years though. Comparing between closely related species is an ideal use for transposome. Here is a link to the study for which I designed transposome (a comparative analysis of TE properties across a plant family). That paper may be helpful in designing your study because it goes beyond the typical descriptive statistics.
ADD REPLY
• link
updated 2.2 years ago by
Ram
44k
•
written 9.2 years ago by
SES
8.6k
Thanks for the answer, I wasn't aware that that RepeatModeler worked this way. I want to compare TEs between sister species and sister clades, so I guess that RepeatModeler might not be the best choice here. Do you know anything about REPCLASS?
I will look at Transposome. I really appreciate the help, thank you!
I have used repclass but didn't find it very useful. After all the set up it classified things to a very course level but I have not tried it in ~5 years though. Comparing between closely related species is an ideal use for transposome. Here is a link to the study for which I designed transposome (a comparative analysis of TE properties across a plant family). That paper may be helpful in designing your study because it goes beyond the typical descriptive statistics.