Dear users,
I read through UCSC wiggle format (BigWig) in depth. I understand the format about variable step, fixed step and use of bigwig to visualize in a browser etc. However, I don't have clarity on using ChIP-Seq, ATAC-Seq data in bigWig format.
Example of the Wig file after converting from bigwig format using bigWigToWig. This example data is from ATAC-Seq.
In this context, could someone help understanding:
1. What is the real number value in 4th column. Is the read depth for that position? or some transformed value of read depth. Typically in ChIP-Seq or ATAC-Seq, what is the value that one would represent in this column.
If this is threshold, how users specify thresholds for selecting significantly enriched regions. (I am not sure if any statistical test is associated with significance, I cannot find any reference but users call it so).
I have 6 ATAC-Seq bigwig files for 6 different samples. How do I find the regions of interest in at least 4 of samples.
oriolebaltimore : Please use the formatting bar (especially the code option) to present your post better. I've done it for you this time.
I am not sure if you added " to try and format the example data. I left them in there. You can edit and take them out, if they are not part of example.
Thanks. I added them purposefully for formatting purpose as I felt this is not code and saw no other formatting option for the sample lines from wig file. Thanks.
I cannot comment on the specifics of ATAC-Seq but your data is actually in bedGraph format, as described here. The value in the 4th column is the Y-axis value for a graph where the X-axis values are the range between columns 2 and 3.
The bedGraph (or bigwig) format is always the same: chr-start-end-value. Value can actually be anything that can be associated with a stretch of DNA as defined in the first three columns. It can be the raw read count for that interval, it can be normalized read count like reads per million, it can be an enrichment score for this experimental condition over a control experiment, it can be the mean methylation store, the GC content etcetc. Most commonly, people use bedGraph/bigwig to create browser tracks displaying the normalized read count across the genome, and in this case, it would not matter if it is ATAC-seq, ChIP-seq or Whatever-seq. One simply counts the number of reads that cover each base and aggregates bases with equal coverage into one interval to make the files smaller, so if the first 100 bases of a chromosome have coverage of 0, one would write:
chr1 0 100 0
instead of 100 intervals like:
chr1 0 1 0
chr1 1 2 0
chr1 2 3 0
(...)
For statistical analysis, one typically calls peaks (e.g. with MACS) and then makes a count matrix to obtain the raw counts for each replicate per peak. Significances between conditions are then inferred with appropriate statistical frameworks, such as DESeq2, edgeR, csaw etc. Please use the search function and google for differential analysis of ATAC-seq data, there is plenty of material available.
oriolebaltimore : Please use the formatting bar (especially the
code
option) to present your post better. I've done it for you this time.I am not sure if you added
"
to try and format the example data. I left them in there. You can edit and take them out, if they are not part of example.Thank you!
Thanks. I added them purposefully for formatting purpose as I felt this is not code and saw no other formatting option for the sample lines from wig file. Thanks.
I cannot comment on the specifics of ATAC-Seq but your data is actually in bedGraph format, as described here. The value in the 4th column is the Y-axis value for a graph where the X-axis values are the range between columns 2 and 3.
I also responded to another similar post and requested the same question in the same forum. I apologize if that violates duplicate question rules.