Hello, I am trying to do this tutorial but with bulk ATAC seq data. It looks like they create a peak matrix that contains peaks on the y-axis and samples on the x-axis. But I have not figured out what values are used for inside the peak matrix. Is a peak matrix something that is usually created in the ATAC-seq analysis pipeline? If so, what values does it normally contain?
The matrix itself is called "counts" so my thinking is the integers inside the matrix might be the counts. Are the counts the same as the pileup value that is outputed by macs2? My ATAC-seq data is in the form of bed files with the chromosome, start, end, length, abs_summit, pileup, -log10(pvalue), fold_enrichment, and -log10(qvalue). Can I simply take the pileup value and use that as the count number?
What is your analysis goal? With bulk ATAC-seq data most of the steps in this tutorial are either unnecessary, overly-compilcated or make little sense. What is you analysis goal? What you see is a sparse and compressed matrix, the values are not meant to be examined by eye.
Our overall analysis goal is to do a meta-analysis of 3 ATAC-seq and 3 RNA-seq datasets. I want to use MOFA to integrate these different -omic bulk datasets. This particular tutorial is for single-cell datasets and I would not do all the steps in this tutorial (like making the seurat object) for a bulk data set.
I am trying to figure out how to format the bulk ATAC data into a matrix that MOFA can use. It doesn't look like they only use peak regions in this matrix (on the y-axis) because of those integers I see in the matrix- where do those values come from and are they derived from the ATAC dataset counts? Or are they just an artifact of this type of matrix / not interpretable by eye? Trying to understand how I can create a similar matrix but for bulk ATAC data.
Here is an example with a bulk RNA data set, but they did not include any ATAC data in this one: Bulk RNA