Hi,
I downloaded the rmsk.txt
from UCSC genome browser, http://genome.ucsc.edu/cgi-bin/hgTables?db=hg38&hgta_group=rep&hgta_track=rmsk&hgta_table=rmsk&hgta_doSchema=describe+table+schema,
I got following stuff:
bin swScore milliDiv milliDel milliIns genoName genoStart genoEnd genoLeft strand repName repClass repFamily repStart repEnd repLeft id
585 463 13 6 17 chr1 10000 10468 -248945954 + (TAACCC)n Simple_repeat Simple_repeat 1 471 0 1
585 3612 114 215 13 chr1 10468 11447 -248944975 - TAR1 Satellite telo -399 1712 483 2
585 484 251 132 0 chr1 11504 11675 -248944747 - L1MC5a LINE L1 -2382 395 199 3
585 239 294 19 10 chr1 11677 11780 -248944642 - MER5B DNA hAT-Charlie -74 104 1 4
585 318 230 37 0 chr1 15264 15355 -248941067 - MIR3 SINE MIR -119 143 49 5
585 18 232 0 19 chr1 15797 15849 -248940573 + (TGCTCC)n Simple_repeat Simple_repeat 1 52 0 6
585 18 137 0 0 chr1 16712 16744 -248939678 + (TGG)n Simple_repeat Simple_repeat 1 32 0 7
585 239 338 129 0 chr1 18906 19048 -248937374 + L2a LINE L2 2942 3104 -322 8
585 994 312 60 25 chr1 19971 20405 -248936017 + L3 LINE CR1 2680 3129 -970 9
585 270 331 7 27 chr1 20530 20679 -248935743 + Plat_L3 LINE CR1 2802 2947 -639 1
For example, the repeat name L1MC5a, if I want to get the sequence of this repeat, should I found from RepBase? But I could not find it from Repbase: http://www.girinst.org/repbase/update/browse.php?type=All&format=EMBL&autonomous=on&nonautonomous=on&simple=on&division=Homo+sapiens&letter=L
Aanyone has suggestions on how to fix this? Thanks a lot in advance.
Thanks for you reply @genomax2.
I am not actually looking for genomic sequence for copies, I am looking for consensus sequences.
I don't think there is a consensus. The repeats by their nature will have difference. e.g. If I restrict to the output to
L2a
LINE repeat I get this summary from the table browser