Simple tandem repeat (STR) reference for hg38
0
0
Entering edit mode
2.9 years ago
Raman2 ▴ 30

Is there a comprehensive reference for all the STRs (Short tandem repeats) present in human genome? I looked into few different resources but couldn't get all the repeats in hg38.

For example, I downloaded UCSC simple repeats annotated by Tandem Repeat Finder (http://hgdownload.cse.ucsc.edu/goldenpath/hg38/database/simpleRepeat.txt.gz), but the list misses serval repeats. I downloaded another reference called gangSTR (https://github.com/gymreklab/GangSTR#references) and this one misses some repeats too. There was only about 30% overlap between the 2 databases.

On a slightly different note, I noticed the shortest repeat region length (not repeat unit length) in hg38.simpleRepeat.txt.gz was 25. So any repeat region less than 25 bases are not included in them.

Thanks!

repeats STR • 1.2k views
ADD COMMENT
0
Entering edit mode

Could you give examples of some missed repeats? It isn't clear how you're defining what is missed

ADD REPLY
0
Entering edit mode

Thank you! For example these regions are not found in the repeat databases -

CHR START STOP SEQ Repeat_Unit Repeat_times

chr1 930278 930287 GGCGGCGGC GGC 3

chr1 939279 939288 CTGCTGCTG CTG 3

chr1 942602 942610 GCGCGCGC GC 4

ADD REPLY

Login before adding your answer.

Traffic: 1640 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6