Hello everyone,
I have 50 BAM files, some of them single-end and some of them paired-end. Well, I want to make a single bigwig file by combining reads from all of these bam files.
For this, I merged all bam files to a single giant bam file(700GB). However, I am getting out of memory issues while sorting this giant bam file.
- Is there any way I could sort this huge bam file?
- Is it ok to merge single end and paired end bam files together?
Have you tested this? bedGraph format should have (afaik) non-overlapping adjacent bins so you would need to also parse the coordinates and transform them. Easy with https://bedtools.readthedocs.io/en/latest/content/tools/unionbedg.html to get a proper bedGraph in terms of the coordinates and then some awk-fu to sum the coverage values.
Thanks alot ATpoint and LChart . I feel, in the end I need to normalize the bedgraph with the total mapped reads(probably the sum total of coverage signals of all bins in this case.)
According do the docs at least
-d -bga
should be giving per-base coverage for every base, including 0-coverage bases; so the outputs should all line up.Yes, but in bedGraph per-base values with identical coverage get binned, so like
is displayed as
so the length of these bins is different between bam files. Not sure what
-d
does, never used it, but bedGraph is 0-based by definition so-d
is probably ignored.Regardless,
bedtools genomecov
does not need sorted files so you can simply usesamtools cat
to concat all BAMs and then stream that right intogenomecov
. That saves you from any issues as what I describe.