making a BIGWIG from BAM file
2
0
Entering edit mode
21 months ago
Rajendra KC ▴ 20

Hello everyone,

I have 50 BAM files, some of them single-end and some of them paired-end. Well, I want to make a single bigwig file by combining reads from all of these bam files.

For this, I merged all bam files to a single giant bam file(700GB). However, I am getting out of memory issues while sorting this giant bam file.

  1. Is there any way I could sort this huge bam file?
  2. Is it ok to merge single end and paired end bam files together?
bam samtools bigwig • 8.3k views
ADD COMMENT
1
Entering edit mode
21 months ago
LChart 4.5k

Is there any reason you need to merge the bam before converting to bigwig? You could use

bedtools genomeCoverageBed -d -bga -ibam $bam hg38.chromInfo.txt > $bam.cov

Then you can simply sum the depths a la

paste $cov1 $cov2 | awk '{print $1,$2,$3,$4+$8}' > merged.bg

How you want to do this (sequentially, hierarchically, all at once) is up to you.

and then run bedGraphToBigWig for the conversion.

ADD COMMENT
1
Entering edit mode

Have you tested this? bedGraph format should have (afaik) non-overlapping adjacent bins so you would need to also parse the coordinates and transform them. Easy with https://bedtools.readthedocs.io/en/latest/content/tools/unionbedg.html to get a proper bedGraph in terms of the coordinates and then some awk-fu to sum the coverage values.

ADD REPLY
0
Entering edit mode

Thanks alot ATpoint and LChart . I feel, in the end I need to normalize the bedgraph with the total mapped reads(probably the sum total of coverage signals of all bins in this case.)

ADD REPLY
0
Entering edit mode

According do the docs at least -d -bga should be giving per-base coverage for every base, including 0-coverage bases; so the outputs should all line up.

ADD REPLY
0
Entering edit mode

Yes, but in bedGraph per-base values with identical coverage get binned, so like

chr1   1   2   3
chr1   2   3   3
chr1   3   4   3

is displayed as

chr1   1   4   3

so the length of these bins is different between bam files. Not sure what -d does, never used it, but bedGraph is 0-based by definition so -d is probably ignored.

Regardless, bedtools genomecov does not need sorted files so you can simply use samtools cat to concat all BAMs and then stream that right into genomecov. That saves you from any issues as what I describe.

ADD REPLY
0
Entering edit mode
21 months ago

Why do it the hard way ?

deeptools bamCoverage will make a bigwig for you straight from a bam without any awk hacking.

https://deeptools.readthedocs.io/en/develop/content/tools/bamCoverage.html

ADD COMMENT
0
Entering edit mode

Here, I am making a single bigwig file from reads from multiple bam files(total 700GB). If I'm gonna use bamcoverage, I'm gonna need to have a single giant bam file. With the available space I have, I can't even get the merged bam file this big sorted, thusI'm suggested here to make multiple bedgraph for each bam and add them to get single bed graph, and eventually convert bed graph to bigwig.

ADD REPLY

Login before adding your answer.

Traffic: 2677 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6