Frequency of repetive DNA sequences in human genome?
1
0
Entering edit mode
10.4 years ago
liddiardk • 0

I have been mapping chromosome translocation breakpoints in a human dataset. I have made a series of observations relating to sequence features in common amongst these recombinant events. However, I do not know how to generate 'control' values with which to compare my observations. Ensembl carries data relating to numbers of genes and coding sequence, but I would also like to know if it is possible to obtain similar data relating to the proportion of the human genome that is repetitive DNA (Alu etc) and heterochromatin and the frequency of homopolymeric tracts and inverted repeats. Please can anyone direct me to databases/information/software that could help me obtain approximate figures for these values? Thanks, Kate

sequence alignment next-gen • 2.6k views
ADD COMMENT
1
Entering edit mode
10.4 years ago

Perhaps the simplest route is to download the RepeatMasker .out file from UCSC and just parse the information you need from that. The fields there are annoyingly formatted (they're separated by variable numbers of spaces, you you'll need to clean that up), but all of the information on SINEs, LINEs, simple homopolymer and other repeats you want is in there. There's also the tandem repeats BED12 file if you want to look at those.

ADD COMMENT
1
Entering edit mode

There is also "rmsk.txt.gz" MySQL dump which gives similar information.

ADD REPLY
0
Entering edit mode

Indeed and that's probably simpler to use.

ADD REPLY
0
Entering edit mode

Thanks so much. I have been using BED tracks based on Repeat masker to annotate my sequence, anyway, so I guess it would make sense to use the same source to extract and calculate frequencies of events.

Thanks for replying,

Kate

ADD REPLY

Login before adding your answer.

Traffic: 2032 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6