After googling and reading, I am still not sure how to correctly interpret this plot. Would you please help me with this? Like the key take away biological information, the color and number in the plot. Thank you so much!
The heatmap indicates by color the amount of signal (whatever this is, RPKM, TPM, normalized counts...) in a windows of +/- 3000bp around the TSS which is the center.
The more blue the more signal, the more red the less signal. If you imagine this heatmap as a matrix then the profile plot is simply the colSums of it. What it tells is that TSS generally have more signal than the direct vicinity, and that there are TSS with a lot of signal (top) and little to no signal (bottom).
Thanks ATpoint! Would you explain the position of the blue line which goes before "genes"? The first one is close to bottom meanwhile to second one is on top. I don't know why the first heatmap has a little dip on the line chart.
That's the legend. It's position top vs bottom is arbitrary.
The blue curves represent an aggregate of the heatmap rows, so the dip shows many rows have depleted signal at that region. Depleted signal depends on the what was use for plotting. If it was ATAC-seq, then it usually means less accessibility, while RNA-seq would be less expression, or a for a TF ChIP it would mean lower binding.
Assuming the same regions were plotted in each heatmap, it may be that the top graph is for a histone mark where center of the genomic regions (where 'TSS' is marked) has nucleosome depletion, so you see a drop in the histone mark signal. Then the bottom graph may be corresponding accessibility where the most accessibility aligns to the potential nucleosome depleted region. If these regions are actually TSSs, this would make sense I think.
Thank you so much for the reply! These are the full bulk ATAC-seq data of a cell I think. The size of each fastq file is around 3Gb. Also the value on the left of the second plot is about 10 times more than the first plot.
These plots are very generic. For more specific interpretations, more details would be needed like experimental design and what signals and regions were input to each plot.
If I assume that these are ATAC-seq signal from two separate conditions plotted over all genes, it sort of looks like one (the top one) didn't work while the bottom one looks like it worked. I would be wary of comparing the samples to each other in this case, but you may be able to still make general comparisons of accessibility between different regions within the sample.
Yes, it is my data. What do you mean the top didn't work? Does that mean the chromatins in this sample are more close? The first plot is from a control and the second from a diseased sample. Let me know if there is any info I can provide.
It looks like ATAC-seq didn't work for the top sample.
In general, lower values mean less accessibility. However, the top sample probably has low numbers due to low enrichment. I would guess the transposition reaction didn't work for the top sample. Maybe the cells were not dissociated properly. You can look up ATAC seq troubleshooting to see why you might have weak enrichemnt.
Each condition has two replicates so do you think it has technical issue with the first condition? Also the RNA-seq of this sample also has gene expression lower than other conditions.
I wouldn't be able to guess what is the issue without knowing more details.
I am very doubtful that all genes have inaccessible promoters in the control sample, though.
Also, what do you mean RNA-seq has lower gene expression? Did you normalize with spike-ins?
Are you convinced this difference between results is real? If not, I would recommend going through the QC data from the beginning to see what step may have went wrong.
Thank you for your reply! There are two replicates for a condition. All sequence length are 75. Total sequences are about 40m or 10m depend on each library.
Thanks ATpoint! Would you explain the position of the blue line which goes before "genes"? The first one is close to bottom meanwhile to second one is on top. I don't know why the first heatmap has a little dip on the line chart.
That's the legend. It's position top vs bottom is arbitrary.
The blue curves represent an aggregate of the heatmap rows, so the dip shows many rows have depleted signal at that region. Depleted signal depends on the what was use for plotting. If it was ATAC-seq, then it usually means less accessibility, while RNA-seq would be less expression, or a for a TF ChIP it would mean lower binding.
Assuming the same regions were plotted in each heatmap, it may be that the top graph is for a histone mark where center of the genomic regions (where 'TSS' is marked) has nucleosome depletion, so you see a drop in the histone mark signal. Then the bottom graph may be corresponding accessibility where the most accessibility aligns to the potential nucleosome depleted region. If these regions are actually TSSs, this would make sense I think.
Thank you so much for the reply! These are the full bulk ATAC-seq data of a cell I think. The size of each fastq file is around 3Gb. Also the value on the left of the second plot is about 10 times more than the first plot.
These are your data?
These plots are very generic. For more specific interpretations, more details would be needed like experimental design and what signals and regions were input to each plot.
If I assume that these are ATAC-seq signal from two separate conditions plotted over all genes, it sort of looks like one (the top one) didn't work while the bottom one looks like it worked. I would be wary of comparing the samples to each other in this case, but you may be able to still make general comparisons of accessibility between different regions within the sample.
Yes, it is my data. What do you mean the top didn't work? Does that mean the chromatins in this sample are more close? The first plot is from a control and the second from a diseased sample. Let me know if there is any info I can provide.
It looks like ATAC-seq didn't work for the top sample.
In general, lower values mean less accessibility. However, the top sample probably has low numbers due to low enrichment. I would guess the transposition reaction didn't work for the top sample. Maybe the cells were not dissociated properly. You can look up ATAC seq troubleshooting to see why you might have weak enrichemnt.
Each condition has two replicates so do you think it has technical issue with the first condition? Also the RNA-seq of this sample also has gene expression lower than other conditions.
I wouldn't be able to guess what is the issue without knowing more details.
I am very doubtful that all genes have inaccessible promoters in the control sample, though.
Also, what do you mean RNA-seq has lower gene expression? Did you normalize with spike-ins?
Are you convinced this difference between results is real? If not, I would recommend going through the QC data from the beginning to see what step may have went wrong.
So if I plot the whole genome, it will average the signal from all promoters in a cell?
Hi rfran010, this looks OK, right?
Sure, but this metric, for me, usually does not indicate how well the whole experiment went.
How many duplicates are in each library. Did you analyze fragment sizes for each library?
Thank you for your reply! There are two replicates for a condition. All sequence length are 75. Total sequences are about 40m or 10m depend on each library.
Which are 10 and which are 40.
What is the duplicate read rate for each library?
Fragment sizes can be derived from QC (bioanalyzer) of the library or from genomic distances between mapped pairs.
Hi. Seem the duplicate rate is normal. Replicate 1 has 40m, and replicate 2 has 10m. I use fastqc and multiqc. Is bioanalyzer a free software?
in each condition one rep has 40m reads and the other has 10m reads? So 50m total reads per condition?
Is this paired end data?
Yes, it is pard end. 40m for each end, so total 40 x 2 + 10 x 2 = 100m per condition.
Okay,
Is there any obvious QC difference in the libraries from the control condition compared to libraries from the treatment condition?
To determine fragment sizes, you can use https://deeptools.readthedocs.io/en/develop/content/tools/bamPEFragmentSize.html
Probably no except this one. The library in diseased is bigger 81m and 45m.
Is it control or disease sample in the picture?
So, control samples = 10m and 40m reads disease sample = 81 and 45m reads ?
The plot is diseased. Yes, it is. Thank you for the follow up!