Question

How to interpret heatmap using plotheatmap from deeptools?

0

Entering edit mode

16 months ago

Chris ▴ 340

Hi all,

After googling and reading, I am still not sure how to correctly interpret this plot. Would you please help me with this? Like the key take away biological information, the color and number in the plot. Thank you so much!

enter image description here

https://training.galaxyproject.org/training-material/topics/epigenetics/tutorials/atac-seq/tutorial.html.

https://deeptools.readthedocs.io/en/develop/content/tools/plotHeatmap.html.

ATAC-seq deeptools • 4.9k views

ADD COMMENT • link 15 months ago by Chris ▴ 340

score 4 · Accepted Answer · 2023-07-28

4

Entering edit mode

16 months ago

ATpoint 85k

The heatmap indicates by color the amount of signal (whatever this is, RPKM, TPM, normalized counts...) in a windows of +/- 3000bp around the TSS which is the center. The more blue the more signal, the more red the less signal. If you imagine this heatmap as a matrix then the profile plot is simply the colSums of it. What it tells is that TSS generally have more signal than the direct vicinity, and that there are TSS with a lot of signal (top) and little to no signal (bottom).

ADD COMMENT • link 16 months ago by ATpoint 85k

0

Entering edit mode

Thanks ATpoint! Would you explain the position of the blue line which goes before "genes"? The first one is close to bottom meanwhile to second one is on top. I don't know why the first heatmap has a little dip on the line chart.

ADD REPLY • link 16 months ago by Chris ▴ 340

1

Entering edit mode

That's the legend. It's position top vs bottom is arbitrary.

The blue curves represent an aggregate of the heatmap rows, so the dip shows many rows have depleted signal at that region. Depleted signal depends on the what was use for plotting. If it was ATAC-seq, then it usually means less accessibility, while RNA-seq would be less expression, or a for a TF ChIP it would mean lower binding.

Assuming the same regions were plotted in each heatmap, it may be that the top graph is for a histone mark where center of the genomic regions (where 'TSS' is marked) has nucleosome depletion, so you see a drop in the histone mark signal. Then the bottom graph may be corresponding accessibility where the most accessibility aligns to the potential nucleosome depleted region. If these regions are actually TSSs, this would make sense I think.

ADD REPLY • link 16 months ago by rfran010 ★ 1.3k

0

Entering edit mode

Thank you so much for the reply! These are the full bulk ATAC-seq data of a cell I think. The size of each fastq file is around 3Gb. Also the value on the left of the second plot is about 10 times more than the first plot.

ADD REPLY • link 16 months ago by Chris ▴ 340

1

Entering edit mode

These are your data?

These plots are very generic. For more specific interpretations, more details would be needed like experimental design and what signals and regions were input to each plot.

If I assume that these are ATAC-seq signal from two separate conditions plotted over all genes, it sort of looks like one (the top one) didn't work while the bottom one looks like it worked. I would be wary of comparing the samples to each other in this case, but you may be able to still make general comparisons of accessibility between different regions within the sample.

ADD REPLY • link 16 months ago by rfran010 ★ 1.3k

0

Entering edit mode

Yes, it is my data. What do you mean the top didn't work? Does that mean the chromatins in this sample are more close? The first plot is from a control and the second from a diseased sample. Let me know if there is any info I can provide.

ADD REPLY • link 16 months ago by Chris ▴ 340

1

Entering edit mode

It looks like ATAC-seq didn't work for the top sample.

In general, lower values mean less accessibility. However, the top sample probably has low numbers due to low enrichment. I would guess the transposition reaction didn't work for the top sample. Maybe the cells were not dissociated properly. You can look up ATAC seq troubleshooting to see why you might have weak enrichemnt.

ADD REPLY • link 16 months ago by rfran010 ★ 1.3k

0

Entering edit mode

Each condition has two replicates so do you think it has technical issue with the first condition? Also the RNA-seq of this sample also has gene expression lower than other conditions.

ADD REPLY • link 16 months ago by Chris ▴ 340

1

Entering edit mode

I wouldn't be able to guess what is the issue without knowing more details.

I am very doubtful that all genes have inaccessible promoters in the control sample, though.

Also, what do you mean RNA-seq has lower gene expression? Did you normalize with spike-ins?

Are you convinced this difference between results is real? If not, I would recommend going through the QC data from the beginning to see what step may have went wrong.

ADD REPLY • link 16 months ago by rfran010 ★ 1.3k

0

Entering edit mode

So if I plot the whole genome, it will average the signal from all promoters in a cell?

ADD REPLY • link 16 months ago by Chris ▴ 340

0

Entering edit mode

Hi rfran010, this looks OK, right? enter image description here

ADD REPLY • link 16 months ago by Chris ▴ 340

1

Entering edit mode

Sure, but this metric, for me, usually does not indicate how well the whole experiment went.

How many duplicates are in each library. Did you analyze fragment sizes for each library?

ADD REPLY • link 16 months ago by rfran010 ★ 1.3k

0

Entering edit mode

Thank you for your reply! There are two replicates for a condition. All sequence length are 75. Total sequences are about 40m or 10m depend on each library.

ADD REPLY • link 16 months ago by Chris ▴ 340

1

Entering edit mode

Which are 10 and which are 40.

What is the duplicate read rate for each library?

Fragment sizes can be derived from QC (bioanalyzer) of the library or from genomic distances between mapped pairs.

ADD REPLY • link 15 months ago by rfran010 ★ 1.3k

0

Entering edit mode

Hi. Seem the duplicate rate is normal. Replicate 1 has 40m, and replicate 2 has 10m. I use fastqc and multiqc. Is bioanalyzer a free software? enter image description here

ADD REPLY • link 15 months ago by Chris ▴ 340

0

Entering edit mode

in each condition one rep has 40m reads and the other has 10m reads? So 50m total reads per condition?

Is this paired end data?

ADD REPLY • link 15 months ago by rfran010 ★ 1.3k

0

Entering edit mode

Yes, it is pard end. 40m for each end, so total 40 x 2 + 10 x 2 = 100m per condition.

ADD REPLY • link 15 months ago by Chris ▴ 340

0

Entering edit mode

Okay,

Is there any obvious QC difference in the libraries from the control condition compared to libraries from the treatment condition?

To determine fragment sizes, you can use https://deeptools.readthedocs.io/en/develop/content/tools/bamPEFragmentSize.html

ADD REPLY • link 15 months ago by rfran010 ★ 1.3k

0

Entering edit mode

Probably no except this one. The library in diseased is bigger 81m and 45m. enter image description here