Question

Downloading genomic interval for hg38

0

Entering edit mode

7.8 years ago

ChIP ▴ 600

Hi,

I want to download genomic information for hg38, that have following information:

region gene exon/tss/intron/intergenic/CpG/nonCpG

The idea is to get this in a table format and then use intersectBed to to get overlap between the ChIP-seq data and this genomic information file.

How can I get this information. The CpG information is important as I have methylation data.

Thank you

ChIP-Seq • 4.5k views

ADD COMMENT • link updated 7.7 years ago by Alex Reynolds 36k • written 7.8 years ago by ChIP ▴ 600

1

Entering edit mode

Hi, please look at UCSC genome browser and select table - you can get print all required informations.

ADD REPLY • link 7.8 years ago by Paul ★ 1.5k

score 3 · Answer 1 · 2017-07-11

GENCODE release 26 offers some of these annotations for hg38. Pipe to BEDOPS gff2bed to make a sorted BED file.

$ wget -qO- ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_26/gencode.v26.annotation.gff3.gz | gunzip -c - | gff2bed - > gencode.v26.bed

For hg38, you can grab the cpgIslandExt table from UCSC's goldenpath service, and use BEDOPS sort-bed to build a sorted BED4+ file:

$ wget -qO- http://hgdownload.cse.ucsc.edu/goldenpath/hg38/database/cpgIslandExt.txt.gz \
   | gunzip -c - \
   | awk 'BEGIN{ OFS="\t"; }{ print $2, $3, $4, $5$6, substr($0, index($0, $7)); }' - \
   | sort-bed - \
   > cpgIslandExt.hg38.bed

Derived from the table schema for this file, the first four columns are the island's genomic interval and name. The remaining columns are island length, number of CpGs in the island, the number of C and G in the island, the percentage of island that is CpG, the percentage of island that is C or G, and the ratio of observed(cpgNum) to expected(numC*numG/length) CpG in island.

Once you have these files in sorted BED format, you can start doing set operations and mapping with BEDOPS bedops and bedmap etc.

score 2 · Answer 2 · 2017-07-10

2

Entering edit mode

7.7 years ago

shwethacm ▴ 240

UCSC table browser has what you need (and more! ) https://genome.ucsc.edu/cgi-bin/hgTables

(( PS: CpG information is under group:Regulation ))

ADD COMMENT • link 7.7 years ago by shwethacm ▴ 240

score 1 · Answer 3 · 2017-07-10

1

Entering edit mode

7.7 years ago

Ming Tommy Tang ★ 4.6k

see http://crazyhottommy.blogspot.com/2016/11/define-intronic-exonic-and-intergenic.html it is for hg19, but you can just change to hg38.

ADD COMMENT • link 7.7 years ago by Ming Tommy Tang ★ 4.6k