ChIP-seq heatmap input bed file
1
1
Entering edit mode
3.1 years ago
buffealo ▴ 130

Hello,

I am getting confused about the ChIP-seq heatmap concept.

In particular, what should be the .bed file exactly? For example while using deeptools before plotHeatmap, computeMatrix is used and you should provide bigwig file and a bed file.

Bigwig file should be the chipseq sample coverage and what should be the bed file in a typical chip-seq enrichment heatmap (usually defined as center, or indicated with 0)? Where should I find the particular bed file.

In the documentation it is also defined as the bed file is the genes (locations). So, also center, 0, and TSS are actually the same thing?

Thank you very much in advance.

bed chip-seq deeptools heatmap • 2.8k views
ADD COMMENT
3
Entering edit mode
3.1 years ago
Papyrus ★ 3.0k

Well, typically these types of heatmaps in ChIP-seq are used to represent the signal of your protein (bigwig) in some regions of the genome (bed). So, the bed file is up to you: it depends on your biological question or what you want to plot. It will contain the regions in which to plot the signal of the ChIP-seq.. For example, if your protein tends to be at promoters you may want to input a bed file of gene coordinates, to plot the gene regions and X bp upstream to see the signal around those locations. But if your protein is at enhancers you may want to input a bed file of enhancer locations to see the signal there.

The particular TSS and TES label, is by default in computeMatrix the label asigned to the start or end of the bed coordinates. So if you input some regions which do not indicate genes, it will not be really TSS or TES, and you may want to change it (most simply to "start" and "end").

ADD COMMENT
0
Entering edit mode

Thank you so much. It is more clear for me . So according to my understanding, for a typical representation of a transcription factor chip-seq signal, I should use bed file of the genes (in my case it is hg19). So can I find it on web or should I conduct additional particular analysis to generate gene bed file for that particular chip-seq experiment? I hope you would answer me back. Thank you in advance. Best

ADD REPLY
0
Entering edit mode

There are many ways to do this. But it is difficult to define what is a gene (as they have many transcripts). To get all isoforms, a quick example could be to go to the UCSC table browser, and use the options I show you in this image to output all the isoforms in BED format:

tablebrowser

Other ways to get "genes", for example, in R you could do:

library(TxDb.Hsapiens.UCSC.hg19.knownGene)
hg19 <- TxDb.Hsapiens.UCSC.hg19.knownGene
genes <- genes(hg19)
write.table(as.data.frame(genes)[,1:3], file = "genes.bed", sep = "\t", col.names = F)

Nonetheless, probably the best thing for any project would be to use download the GTF from your genome build (i.e. the GTF/GFF file which is next to the FASTA file of the genome you used, from UCSC, or Gencode or whatever) and use that for everything. You can then use many strategies to get the BED of gene coordinates, such as those described in these posts.

ADD REPLY
0
Entering edit mode

Thank you so much, it is definitely clear for me now.

ADD REPLY
0
Entering edit mode

So, If I want to plot a profile to display TSS-TES region, the region body length as 5000 bp, the before & after region start length as 3000 bp (computematrix parameter). Should I create a BED file where each line represents the start and end positions of a transcript? Is it correct for me to use this command to generate a BED file?

awk '$3 == "gene"' species.gff3 | awk 'BEGIN{FS="\t|=|;";OFS="\t"}{print $1,$4,$5}' > gene.bed
ADD REPLY

Login before adding your answer.

Traffic: 2012 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6