Question

Bigwig format and operation

2

Entering edit mode

6.0 years ago

oriolebaltimore ▴ 190

Dear users, I read through UCSC wiggle format (BigWig) in depth. I understand the format about variable step, fixed step and use of bigwig to visualize in a browser etc. However, I don't have clarity on using ChIP-Seq, ATAC-Seq data in bigWig format.

Example of the Wig file after converting from bigwig format using bigWigToWig. This example data is from ATAC-Seq.

#bedGraph section chr1:0-870999
chr1    0       9999    0

chr1    9999    10099   16.561

chr1    10099   10199   24.2045

chr1    10199   10299   2.54784

chr1    10299   10399   5.09568

chr1    10399   10499   11.4653

chr1    10499   10599   7.64352

chr1    10599   10699   3.82176

chr1    10699   13199   0

In this context, could someone help understanding: 1. What is the real number value in 4th column. Is the read depth for that position? or some transformed value of read depth. Typically in ChIP-Seq or ATAC-Seq, what is the value that one would represent in this column.

If this is threshold, how users specify thresholds for selecting significantly enriched regions. (I am not sure if any statistical test is associated with significance, I cannot find any reference but users call it so).
I have 6 ATAC-Seq bigwig files for 6 different samples. How do I find the regions of interest in at least 4 of samples.

Thank you for your help.

-Adrian

Bigwig ChIP-Seq ATAC-Seq BigWig • 8.0k views

ADD COMMENT • link 6.0 years ago by oriolebaltimore ▴ 190

0

Entering edit mode

oriolebaltimore : Please use the formatting bar (especially the code option) to present your post better. I've done it for you this time.
code_formatting

I am not sure if you added " to try and format the example data. I left them in there. You can edit and take them out, if they are not part of example.

Thank you!

ADD REPLY • link 6.0 years ago by GenoMax 147k

0

Entering edit mode

Thanks. I added them purposefully for formatting purpose as I felt this is not code and saw no other formatting option for the sample lines from wig file. Thanks.

ADD REPLY • link 6.0 years ago by oriolebaltimore ▴ 190

0

Entering edit mode

I cannot comment on the specifics of ATAC-Seq but your data is actually in bedGraph format, as described here. The value in the 4th column is the Y-axis value for a graph where the X-axis values are the range between columns 2 and 3.

ADD REPLY • link 6.0 years ago by vkkodali_ncbi ★ 3.8k

0

Entering edit mode

I also responded to another similar post and requested the same question in the same forum. I apologize if that violates duplicate question rules.

ADD REPLY • link 6.0 years ago by oriolebaltimore ▴ 190

score 6 · Accepted Answer · 2018-12-16

The bedGraph (or bigwig) format is always the same: chr-start-end-value. Value can actually be anything that can be associated with a stretch of DNA as defined in the first three columns. It can be the raw read count for that interval, it can be normalized read count like reads per million, it can be an enrichment score for this experimental condition over a control experiment, it can be the mean methylation store, the GC content etcetc. Most commonly, people use bedGraph/bigwig to create browser tracks displaying the normalized read count across the genome, and in this case, it would not matter if it is ATAC-seq, ChIP-seq or Whatever-seq. One simply counts the number of reads that cover each base and aggregates bases with equal coverage into one interval to make the files smaller, so if the first 100 bases of a chromosome have coverage of 0, one would write:

chr1    0   100 0

instead of 100 intervals like:

chr1    0   1   0

chr1    1   2   0

chr1    2   3   0

(...)

For statistical analysis, one typically calls peaks (e.g. with MACS) and then makes a count matrix to obtain the raw counts for each replicate per peak. Significances between conditions are then inferred with appropriate statistical frameworks, such as DESeq2, edgeR, csaw etc. Please use the search function and google for differential analysis of ATAC-seq data, there is plenty of material available.