Question

Issue understanding ATAC-seq PCA

0

Entering edit mode

3 months ago

Ronin ▴ 10

I have an ATAC-seq experiment and generated a PCA of my samples. I am dealing with three different tissues in a non-model organism. My issue is I do not understand what (if any) issues exist in my PCA. It does not make sense to me. Here is the PCA I generated of all 18 samples (6 samples per tissue):

PCA of 18 samples

This is the code I used:

multiBamSummary bins --bamfiles ATAC_01_mapped.sorted.rem_dup.bam ATAC_03_mapped.sorted.rem_dup.bam ATAC_04_mapped.sorted.rem_dup.bam ATAC_05_mapped.sorted.rem_dup.bam ATAC_07_mapped.sorted.rem_dup.bam ATAC_08_mapped.sorted.rem_dup.bam ATAC_13_mapped.sorted.rem_dup.bam ATAC_15_mapped.sorted.rem_dup.bam ATAC_16_mapped.sorted.rem_dup.bam ATAC_17_mapped.sorted.rem_dup.bam ATAC_19_mapped.sorted.rem_dup.bam ATAC_20_mapped.sorted.rem_dup.bam ATAC_25_mapped.sorted.rem_dup.bam ATAC_27_mapped.sorted.rem_dup.bam ATAC_28_mapped.sorted.rem_dup.bam ATAC_29_mapped.sorted.rem_dup.bam ATAC_31_mapped.sorted.rem_dup.bam ATAC_32_mapped.sorted.rem_dup.bam -o ./ATACresults_picard.npz --outRawCounts readCounts.tab --centerReads

plotPCA -in ATACresults_picard.npz --rowCenter -o all_tissues-pca.png

So my assumption would be that the PCA would show three clusters, corresponding to each of the three tissue types. But instead we see this more-or-less straight line.

I've also generated a heatmap, which does indeed show clustering based on tissue type:

heatmapATAC

So I have these three regions that are perfectly segregating based on the sample, yet my PCA does not reflect this. Are there some issues in my code, or is this PCA actually informative in some way?

For what it is worth, I also ran similar code, but for each of the three tissue types to generate 3 additional PCAs. In each case, the PCAs were similarly vertical, like so:

spleenPCA

I would appreciate any insights, corrections, etc. that might be offered. And also, mtDNA has been removed, the reads were cleaned and checked prior to any of these analyses. Cheers,

ATAC-seq PCA • 643 views

ADD COMMENT • link updated 3 months ago by LChart 4.7k • written 3 months ago by Ronin ▴ 10

0

Entering edit mode

I do not know the tool that generated this, but my suggestion is to go the usual way: Call peaks, make a consensus peak set, from this create a count matrix, load into R, normalize with something like DESeq2 or edgeR, get normalized counts on log2 scale and then do a PCA with the top (for example) 1000 most variable regions.

ADD REPLY • link 3 months ago by ATpoint 86k

0

Entering edit mode

This was done using deeptools. I'll try using featurecounts to get the matrix and try an edgeR tutorial. I believe edgeR is used more than DESeq2 for ATACseq, but I'll check to make sure.

ADD REPLY • link 3 months ago by Ronin ▴ 10

0

Entering edit mode

I believe edgeR is used more than DESeq2 for ATACseq, but I'll check to make sure

Their inference is similar, it's a metter of personal preference.

ADD REPLY • link 3 months ago by ATpoint 86k

score 0 · Answer 1 · 2024-09-28

It would be best if you marked the tissues with either identical symbols or colors so that you can tell which group is where

The PCA plot's job is to show whether the data has internal structure and consistency across conditions.

The data you show do not seem to indicate well-defined groups, but you should replot the way I mentioned so that more conclusions can be drawn.

That being said, your plots are a little worrisome ...

score 0 · Answer 2 · 2024-09-28

I know you specified the --rowCenter flag in the post - but double-check that you actually specified it when producing the plot. A 95% variance explained first eigenvalue, and no sign difference between the eigenvector components, pretty much means the PCA was performed on a matrix with all positive entries - i.e., that the data was not row-centered.