Hopefully an easy one: I'm looking to get a file containing the coordinates of every base in the human build 37 genome that is covered by a segmental duplication (e.g. a BED file).
I've downloaded the full set of seg dups from http://humanparalogy.gs.washington.edu/build37/build37.htm but these appear to contain a redundant set of all pairwise locations of segmental duplications. I could write some code to merge these, but has anyone already generated a non-redundant file that simply tells me whether a given GRCh37 base is in fact spanned by a seg dup?
Please note that different databases may give you vastly different results. The first question to ask is "which is the most accurate" instead of "which is the most convenient". Merging overlapping regions in a BED is extremely easy. You can use bedtools, or just one line of awk.
If you want BED format, use the Table Browser, click this link, select BED from the "output format" dropdown menu, click "get output" and then click "get BED" on the next page
Just a note that if you do this from Galaxy, you can then merge the overlapping bed records and get the unique bed regions covered by at least one segmental duplication.
Please note that different databases may give you vastly different results. The first question to ask is "which is the most accurate" instead of "which is the most convenient". Merging overlapping regions in a BED is extremely easy. You can use bedtools, or just one line of awk.