GENCODE release 26 offers some of these annotations for hg38. Pipe to BEDOPS gff2bed
to make a sorted BED file.
$ wget -qO- ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_26/gencode.v26.annotation.gff3.gz | gunzip -c - | gff2bed - > gencode.v26.bed
For hg38
, you can grab the cpgIslandExt
table from UCSC's goldenpath service, and use BEDOPS sort-bed
to build a sorted BED4+ file:
$ wget -qO- http://hgdownload.cse.ucsc.edu/goldenpath/hg38/database/cpgIslandExt.txt.gz \
| gunzip -c - \
| awk 'BEGIN{ OFS="\t"; }{ print $2, $3, $4, $5$6, substr($0, index($0, $7)); }' - \
| sort-bed - \
> cpgIslandExt.hg38.bed
Derived from the table schema for this file, the first four columns are the island's genomic interval and name. The remaining columns are island length, number of CpGs in the island, the number of C and G in the island, the percentage of island that is CpG, the percentage of island that is C or G, and the ratio of observed(cpgNum) to expected(numC*numG/length) CpG in island.
Once you have these files in sorted BED format, you can start doing set operations and mapping with BEDOPS bedops
and bedmap
etc.
Hi, please look at UCSC genome browser and select table - you can get print all required informations.