I have 3 ATAC-seq conditions, with 2 replicates in each. I want to compare how similar they are to each other. I mapped with BWA and the output bam file was converted to bigwig format using bamcoverage, and normalising using RPKM.
I then used multibigwigsummary as following:
./multiBigwigSummary bins -b cellpop1_rep1 cellpop1_rep2 cellpop2_rep1 cellpop2_rep2 cellpop3_1 cellpop3_2 -out cell123.npz --binSize=100
The PCA plot of the resultant count summary looks like this:
Am I right in thinking that my replicates cluster well together, meaning they are quite similar. However, there are differences in PC2. But the eigenvalues of this second component are minimal? Meaning that the differences are small between the samples, but they do exist?
Rerun
plotPCA
with--transpose
.what does this do? I don't understand what transpose does to the information?
It ... transposes the data, turning the rows into columns and columns into rows. In processes where the columns and rows are significantly different in meaning, such as when working with
data.frame
s and PCA, this can make a significant difference in the meaning of the output, as opposed to when you're working with data structures such as 2D matrices of numbers.Basically it does what Ram indicated. The gist is that at the moment PC1 is genomic position level changes, which will tend to be huge for ATAC data. That's going to end up masking what you're actually interested in, namely how well your samples actually cluster together. By transposing the matrix you look more at that (this is the standard in things like RNA-seq).