Hello!
I am plotting some heatmaps for a certain histone mark from a ChIP-seq experiment. I am producing the heatmaps using both computeMatrix
and plotHeatmap
functions from deeptools
.
The coordinates regions used for computeMatrix
are all coding regions for the mouse genome, sorted in descending order depending on length.
While I do expect indeed enrichment at the TSS (since in this case I am plotting H3K4me3 signal), I also noticed this unusual pattern with a sort of "V" shape. I wonder if it is normal or if there might be some issues I am not aware of.
This is my command line for computeMatrix
:
computeMatrix reference-point --referencePoint TSS -p 6 -S path_to/sham_h3k4me3.bw path_to/contra_h3k4me3.bw path_to/ipsi_h3k4me3.bw -R grcm39_chipseq/coding_genes_coordinates/coding_genes_coordinates_mm39.txt -b 5000 -a 15000 --skipZeros --sortRegions keep -o path_to/gene_coding_regions_h3k4me3_matrix.gz
This is my command line for plotHeatmap
:
plotHeatmap -m path_to/gene_coding_regions_h3k4me3_matrix.gz -o path_to/h3k4me3_gene_coding_heatmap.pdf --sortRegions no --colorList white,red --heatmapWidth 8 --zMin 0 --zMax 60 --heatmapHeight 40 --outFileSortedRegions path_to/h3k4me3_gene_coding_heatmap.bed --samplesLabel "Sham H3K4me3" "Contra H3K4me3" "Ipsi H3K4me3" -z "Coding genes - H3K4me3" --whatToShow 'heatmap and colorbar'
Thanks in advance!
Does your region file have strand information?
Yes it has strand information (+ or -).
Maybe share a few lines of the region file?
Here is the top 10 lines of the region file
I believe deepTools expects
chrom start end name **score** strand
Also, I believe you can have deeptools sort by region length for you with the benefit of possibly adding a line marking the region end. Probably doesn't make sense for K4me3, but just in case its useful.
Thanks for your replies, I will give it a try. However, regarding the columns
computeMatrix
expects in the regions file, I think it only matters that the first 3 columns are:chrom start end
, as a regular bed file. All the other columns should not be taken into account.Yes, that is true, minimally it only needs chrom start end. However, it can also take into account additional columns when provided in standard formats (BED6 and BED12 is how they refer to it, I believe).
This is important if you want to line up TSS since you need to take into account strand. + strand items will have TSS in the start column, while - strand items will have TSS in the end column.
You can either do this manually by narrowing regions to their actual TSS, or use deepTool's functionality which will automatically take it into account if you provide strand information. I believe that will solve your problem where you see the promoter signal at TSS and TES. You can also provide a GTF file to deeptools.
Alright, thanks! I will definitely give it a try :)
Can you share the code on how these regions were obtained and sorted?
I have obtained the coding regions for mm39 through UCSC Table Browser. Then, I simply sorted the regions myself based on the length of each of them (end coordinate - start coordinate).
Coding regions means what exactly? Exons, or cDNA, the latter without introns?
The regions have been obtained by selecting the following fields on UCSC Table Browser: