Entering edit mode
9.2 years ago
Elke Schaper
▴
110
For my Ph.D., I've implemented solutions to a large number of tasks related to sequence tandem repeats. We've now decided to make the code accessible and reusable for others, hoping that it's going to safe a lot of time for some of you!
Features
- Detect nucleic or protein tandem repeats with de novo software. TRAL can be used to run, parse, merge and output results from external tandem repeat detection tools in an output format of choice.
- Detect tandem repeats from a sequence profile HMMs. In case you already know the sequence of your tandem repeat more or less, but are interested in either refining the annotation (e.g. if some repeat units are missing from the annotation) or search for homologous tandem repeats in other sequences.
- Statistical significance analysis of putative tandem repeats. We and others have found that specificity is a big issue with many tandem repeat annotation tools. To make sure you can trust your tandem repeat annotations, TRAL ships with ad hoc and model-based statistical tests for nucleic and protein tandem repeats. Using these tests, each tandem repeat is tagged with a p-value, and you can decide the threshold.
- Overlap detection and filtering. When you merge tandem repeat annotations from several sources, you may want to discard overlapping repeats. Several definitions of overlap are implemented in TRAL.
- new Reconstruct tandem repeat unit phylogenies.
Technical details
- Implemented for Python 3
- Installation instructions, documentation and tutorials are available on GithubIO.
- Open-access source code is available on Github.
- TRAL is on PyPi, and can be installed as
pip install tral
Tutorials
- Extensive tutorials are available on GithubIO. Please mail me if you wish for a tutorial for a specific task!
Example
This is a short example of how you can annotate your sequences with TRF in three lines of code:
#Python3
from tral.sequence import sequence
sequences = sequence.Sequence.create(file = "path/to/my/sequences.fa", input_format = 'fasta', sequence_type = "DNA")
tandem_repeats = [i_seq.detect(denovo = True, detection = {"detectors": ["TRF"]}) for i_seq in sequences]
More examples are available in the docs.
Your feedback - every comment is helpful!
If you believe TRAL might help your research or save you time, please feel free to contact me, or post the project.
- Feature requests
- How to implement specific tasks
- Bug reports
Publications
Here's to some background of TRAL:
- TRAL: E Schaper, A Korsunsky, J Pecerska, A Messina, R Murri, H Stockinger, S Zoller, I Xenarios, and M Anisimova (2015). TRAL: Tandem Repeat Annotation Library. Bioinformatics. DOI: 10.1093/bioinformatics/btv306
- Statistical testing of tandem repeats, benchmark of tandem repeat annotation tools: E Schaper, AV Kajava, A Hauser & M Anisimova (2012). Repeat or not repeat?-Statistical validation of tandem repeat prediction in genomic sequences. NAR. DOI: 10.1093/nar/gks726
- Phylogenetic analysis of tandem repeat unit evolution: E Schaper, O Gascuel & M Anisimova (2014). Deep conservation of human protein tandem repeats within the eukaryotes. MBE. DOI: 10.1093/molbev/msu062
- Short intro to some computational issues with tandem repeats: M Anisimova, J Pecerska, E Schaper (2015). Statistical approaches to detecting and analyzing tandem repeats in genomic sequences. Frontiers in Bioengineering and Biotechnology. DOI: 10.3389/fbioe.2015.00031