I'm converting an R code (from here: https://github.com/cancer-genomics/delfi_scripts) from hg19 to hg38 assembly, and it relies on automatically downloading telomeric and centromeric regions from the UCSC table browser:
genome <- "hg19"
mySession <- browserSession()
genome(mySession) <- genome
gaps <- getTable(ucscTableQuery(mySession, track="gap"))
in order that the resulting fragment calculations don't cover (some of) the less mappable regions of the genome. However, the corresponding code for hg38:
genome <- "hg38"
mySession <- browserSession()
genome(mySession) <- genome
gaps <- getTable(ucscTableQuery(mySession, track="gap"))
does not return the centromere positions from the UCSC table (it has all the telomeric ranges). I have tested the online table browser and that does not return hg38 centromere positions either. Is there another source for these?
While UCSC support stops by here once in a while you should probably report this directly to them (genome at soe.ucsc.edu) and then provide an update here.
I will provide feedback to them about this. However, I'm just looking for a table of positions in hg38 that I can bolt on to the existing removed regions to ease the workflow.
EDIT: it does look like this is a frequent question to them, see for instance here: https://groups.google.com/a/soe.ucsc.edu/forum/#!msg/genome/SaR2y4UNrWg/XsGdMI3AazgJ
The answer to this query doesn't really help, though.
Hi, still any solution?
I'm looking for the centromeric regions in release GRCh38.
I would need something like: chr start end
I don't understand where I can find it, I looked everywhere and still there is no direct information.
Thanks
See my answer.