Entering edit mode
13 months ago
Rory Osborne
▴
10
Hi there,
I'm struggling to view some .bw files in IGV - however when I load .bam files into IGV (the same ones used to create the aforementioned .bw files) I do not encounter this issue. Whether I have fundamentally misunderstood what a .bw file is or not I am unsure.
The code I use to generate the .bw files is:
bamCoverage -b "${file}" -o "${file%.bam*}_normRPKM.bw" --normalizeUsing RPKM
As you can see from the attached image, there appears to be no information in the .bw file, whereas there is in the .bam file which was used to create the .bw file.
Any help would be greatly appreciated!!
What is the issue, so which information do you expect?
I would expect to see a density plot similar to the bam file, just smaller and easier to work with to create figures, but I'm not sure if that is the purpose of a bw file. The issue is the raw bam file is not normalised whereas the .bw has been
Bigwig is just coverage as you see it. Its nothing else than a genomic interval and a coverage value.
Okay thank you. I was confused because this is literally the example shown on the DeepTools website when using bamCoverage to produce .bw files
Do you have any suggestions as to how I can normalise my data (bam files from a ChIP sequencing experiment) and present them in a way similar to the attached image?
Thanks
Here's my preferred strategy: ATAC-seq sample normalization
Thanks for your suggestion. I got to the bottom of the issue. The chromosomes in the .bw file were formatted in ENSEMBL format (CP002684, CP002685...) instead of 1, 2... Renaming the seqLevels of the .bw files to match the annotation I have loaded in IGV revealed the data. This is strange given the .bam file is also structured based on the ENSEMBL format but IGV is able to read and present the alignment.
With respect to your preferred strategy, what is the precise format of object = raw.counts? Is this simply a table of mapped read for each .bw file? If so, in the case of paired-end sequence data would this be the total mapped reads, or the total pairs of mapped reads?
Thanks for all your help, it's greatly appreciated
The raw count table is a table of read counts over some reference, for example consensus peaks. You could call peaks per sample and then for each group keep those that are reliable (IDR for example, or peaks called in at least two samples), and then make a consensus by merging peaks of all groups, or call peaks over all conditions. Then count reads over these intervals, for example with featureCounts.
I know it's a bit cumbersome, but in my hands these normalization methods based on meaningful size factors as in the linked tutorial are much more reliable than simply scaling per total number of mapped reads.