PCA plots in ATAC-seq replicates: Is it okay to use vst transformed data for visualization?
1
0
Entering edit mode
4 days ago
maplewj ▴ 20

Hi everyone,

I'm fairly new to ATAC-seq analysis and learning everything as I go.

I'm currently analyzing ATAC-seq data processed using the PEPATAC pipeline, followed by differential accessibility analysis using DiffBind. I performed triplicate experiments for each treatment group (A, B, C), and most replicates cluster well — except for one sample in the B group (B_rep3), which consistently appears separated in the PCA plot.

Here’s what I did:

  • Preprocessing: PEPATAC pipeline
  • Peak calling: Iterative merging to get consensus peak set
  • Read counting: dba.count() on iterative peaks
  • Normalization: dba.normalize() with native method and background bin size
  • Analysis: dba.analyze() with both DESeq2 and edgeR
  • PCA: dba.plotPCA() on raw, normalized, and analysis objects

The issue:
In the PCA plot generated by dba.plotPCA(), B_rep3 clusters apart from B_rep1 and B_rep2. However, based on preseq plots, B_rep3 actually has the best library complexity and saturation.

FRiP and read counts (before normalization):

  • B_rep1: 13M reads, FRiP 0.23
  • B_rep2: 12M reads, FRiP 0.25
  • B_rep3: 26M reads, FRiP 0.29

I assumed normalization would correct for this, but B_rep3 still clusters away.

Interestingly, when I apply DESeq2's vst() transformation to the raw count matrix and plot PCA, the replicates cluster perfectly — but I know vst is usually for RNA-seq, not ATAC-seq.

My questions:

  1. Is it okay to include vst-based PCA plots for visualization in ATAC-seq, even if not used downstream?
  2. Would reviewers criticize the PCA separation, or is it fine as long as QC (preseq, FRiP) is strong?
  3. Would it make sense to include QC plots in the Supplement to justify keeping the sample?
  4. Why might vst give better clustering, and is it acceptable to show both?

Thanks so much in advance — any thoughts or similar experiences would help a lot!

ATAC-seq diffbind • 364 views
ADD COMMENT
2
Entering edit mode
4 days ago
ATpoint 87k

but I know vst is usually for RNA-seq

It's just a transformation that unlocks the variance from the mean, you can use it here with no problem. ATAC-seq is also just counts, same as RNA-seq. If your vst PCA looks fine then call it a day. After all, it's a better transformation than just log2 because it better removes technical noise.

Would reviewers criticize the PCA separation, or is it fine as long as QC (preseq, FRiP) is strong?

Frankly, most reviewers only look at results.

Would it make sense to include QC plots in the Supplement to justify keeping the sample?

I would simply include the vst PCA into the supplement to show experiment has good quality together with a spreadsheet of QC metrics, such as depth, number of peaks, FRiPs etc.

Why might vst give better clustering, and is it acceptable to show both?

It separates noise from signal better than log2 normalization, so the other PCA might show more noise. Just use vst and call it a day. After all, QC is always an integration of many aspects, like FRiPs, looking at the peaks in the IGV. If this all says that the experiment is ok then proceed. Just visualize differential regions later on a heatmap and if this looks convincing as well its fine. Dont overfilter for no reason.

ADD COMMENT
0
Entering edit mode

Thank you so much for your kind and clear explanation, this really helped clarify what I was unsure about! I truly appreciate you taking the time to share your insight. Wishing you all the best in your research and everything ahead!

ADD REPLY
1
Entering edit mode

Don't ask me why but the spam bot got triggered and suspended you. I reinstated you, not sure why that happened.

ADD REPLY
2
Entering edit mode

The tone was too nice :-) The bot thought it couldn't possibly be about bioinformatics

ADD REPLY
0
Entering edit mode

Thank you so much for your kind and clear explanation, this really helped clarify what I was unsure about! I truly appreciate you taking the time to share your insight. Wishing you all the best in your research and everything ahead!

ADD REPLY

Login before adding your answer.

Traffic: 2242 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6