I want to identify the short repeated sequences in a given DNA sequence. For example, the sequence I have is atacctgcc ccc atacctgcc
. The repeated sequence is atacctgcc
. Is there a program to identify it?
I want to identify the short repeated sequences in a given DNA sequence. For example, the sequence I have is atacctgcc ccc atacctgcc
. The repeated sequence is atacctgcc
. Is there a program to identify it?
Tandem Repeat Finder http://tandem.bu.edu/trf/trf.html
Hi. If you like C++ you may use the function TandemRepeatsFinding from my lib CBioInfCpp.h (https://github.com/chernouhov/CBioInfCpp-0-)
Here is its result for the given file of a chromosome string (Note: very big .txt file):
https://drive.google.com/file/d/1U_0Avzncl72ST4GlTko7aivEHa7HTdjy/view
Some lines from there (position (0-based indexing) - length - the required string itself): " 84497 8 AAACAAAC 91446 16 AAACAAACAAACAAAC 112057 8 AAACAAAC 120004 8 AAACAAAC 127534 8 AAACAAAC 142152 8 AAACAAAC 152054 8 AAACAAAC 180098 8 AAACAAAC 188353 8 AAACAAAC 188742 8 AAACAAAC 191433 8 AAACAAAC 193100 8 AAACAAAC 257706 8 AAACAAAC 259508 8 AAACAAAC 286463 12 AAACAAACAAAC 295440 8 AAACAAAC 307986 8 AAACAAAC 315909 8 AAACAAAC 317100 8 AAACAAAC 317744 12 AAACAAACAAAC 335172 8 AAACAAAC 348833 8 AAACAAAC 379676 8 AAACAAAC 397346 8 AAACAAAC 402858 12 AAACAAACAAAC 405483 8 AAACAAAC 555931 8 AAACAAAC 629015 8 AAACAAAC 725561 8 AAACAAAC 763278 8 AAACAAAC 778926 8 AAACAAAC 781034 8 AAACAAAC 794123 8 AAACAAAC 823120 24 AAACAAACAAACAAACAAACAAAC 823455 12 AAACAAACAAAC 888206 16 AAACAAACAAACAAAC ..."
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Are you after tandem repeats (uninterrupted) or interspersed repeats (repeats with either large sequences between them, or repeats that are on different chromosomes)? The first answer (TRF) is useful for tandem repeats, not for interspersed ones...