Question

Method of combining replicate data for signal track visualization

1

Entering edit mode

17 months ago

Orange ▴ 30

Hi all

What is the widely accepted method of combining biological replicate data from ATAC-seq (or other high-throughput sequencing) to create a single file per treatment/condition for signal track visualization?

The following are possible strategies I have seen:

Concatenate alignment (bam) files for each replicate into a single bam file, then create bedgraph and bigwig files
Create bedgraph/bigwig files separately for each replicate, then "combine" them into a single bigwig using wiggletools mean function

Because replicates will have different read depths, with the first method, replicates with more reads may get over-represented? My understanding is that when bedgraph/bigwig files are created, signals are normalized. Is calculating the mean of normalized signal (the second method) better?

Thanks for your inputs!

ATAC-seq UCSC IGV • 1.2k views

ADD COMMENT • link updated 17 months ago by Maximilian Haeussler ★ 1.7k • written 17 months ago by Orange ▴ 30

score 2 · Accepted Answer · 2024-01-05

I always do 2). First make individual bedGraphs. Normalize each file, see links below. Essentially, it's dividing the 4th column of the bedGraph by an appropriate scaling factor. Then make a new bedGraph with the average of the indivual ones, I like bedtools unionbedg for this. Then convert to bigwig.

Normalization of BigWig files using TMM from edgeR

ATAC-seq sample normalization

Advantage of 2) is that each file is properly normalized, so you have sort of equal contribution of each sample to the average. Since I calculate the mentioned scaling factors in R up front I can easily combine this with some QC like PCA to identify and maybe remove outliers here.

Downside of 1) is that you would need to normalize individual bam files to have equal contribution of each file to the average which (using the bam files itself) I find hard.