Hello everyone,
I want to find PCA plot for multiple bam files. I used deep tool's multiBamSummary to find signal coverage over genomic bins. Then, I supplied the output compressed numpy array(.npz) file to the plotPCA of deeptools. I got the required PCA plot however, I have some questions:
1)PC1 comprises of highest variation, however it doesn't show up in the chart- most of the points line up in a vertical line and don't show much variation. Upon search in internet , I found that its a thing with plotPCA, so do I ignore the first PC.
2) I have heard that data should be normalized before doing PCA. I don't know if the data are normalized in multiBamSummary or plotPCA, or it's not normalized at all. Are there any option to normalize data in multiBamSummary or plotPCA of deeptools or that happens automatically?