Question

Comparing And Plotting Coverage Between Two Bam Files

3

Entering edit mode

12.5 years ago

Rubal7 ▴ 850

Hi all,

I am trying to make plots of differences in coverage between two bam files. Each bam file has reads from individuals of a population. So bamA has individuals from population A and bamB has individuals from population B. Currently the simplest way I can see to do this is to use Bedtools. I am trying the following:

coverageBed -abam PopulationA.bam -b Chr20PopulationA.bed -hist

where Chr20PopulationA.bed is a bed file I made of 200bp windows of Chr20. The idea here is to calculate coverage by windows for PopulationA then do the same for PopulationB, plot them together and visually see outliers. However my bed file, which looks like this:

chr20      1     201
chr20    201     401
chr20    401     601
chr20    601     801
chr20    801    1001
chr20   1001    1201

etc...

gets the error: Unexpected file format. Please use tab-delimited BED, GFF, or VCF. Perhaps you have non-integer starts or ends at line 1? I have no idea why this is, so any help understanding this error and how to fix it is appreciated.

Perhaps more importantly I think there are probably better ways to calculate and plot differences in coverage between two bam files, so if anybody could share their experience or expertise with this that would be really appreciated.

Thanks for your help!

bam coverage genome chromosome • 14k views

ADD COMMENT • link updated 5.7 years ago by john.e.major.jr • 0 • written 12.5 years ago by Rubal7 ▴ 850

0

Entering edit mode

I'm also looking into samtools pileup for each file, then plotting the raw depth scores of each popn against the other

ADD REPLY • link 12.5 years ago by Rubal7 ▴ 850

1

Entering edit mode

If it were me, I would run "samtools depth popA.bam popB.bam" and plot the difference. To identify outliers, simply compute the mean and identify loci outside certain std.dev. This is like CGH. Of course, you can also use bedtools. Note that working with differences helps to reduce bias and is thus better than working with two samples separately and then combining them.

ADD REPLY • link 12.5 years ago by lh3 33k

0

Entering edit mode

Thanks for this comment, didn't realise something so straightforward existed, I'm running it now.

ADD REPLY • link 12.5 years ago by Rubal7 ▴ 850

score 4 · Answer 1 · 2014-05-09

Hi,

the recently published deepTools should do what you are searching for. You can normalize your BAM files and compare them as you like in the end you can generate some nice plots. For more information have a look at the deepTools wiki.

As a bonus, deepTools is available in the Galaxy Tool Shed and has a public Test Server. If you are eager to test it locally you can also install our docker image.

Cheers,

Bjoern

score 2 · Answer 2 · 2012-06-10

2

Entering edit mode

12.5 years ago

Sean Davis 27k

This is a similar (the same) problem as finding copy number variants. You might look into software designed for that purpose, of which there are many.

ADD COMMENT • link 12.5 years ago by Sean Davis 27k

0

Entering edit mode

Thanks yes I'm also using CNVNATOR, but wanted something as simplistic as possible to visualise and manually check regions detected as having copy number variants.

ADD REPLY • link 12.5 years ago by Rubal7 ▴ 850

0

Entering edit mode

I wrote https://github.com/seandavi/ngCGH to be very simplistic. It has helper functions to run DNAcopy (circular binary segmentation) on the output if you want that.

ADD REPLY • link 12.5 years ago by Sean Davis 27k

Ram · Answer 3 · 2012-06-12

what I do when I want to compare coverage, not only between 2 files but among any number of them, is to use bedtools:

extract the coverage from each bam in bedgraph format file using bedtools' genomeCoverageBed

genomeCoverageBed -bg -ibam sample1.bam > sample1.bedgraph
genomeCoverageBed -bg -ibam sample2.bam > sample2.bedgraph

process all those coverage bedgraph files using bedtools' unionBedGraphs

unionBedGraphs -header -i sample1.bedgraph sample2.bedgraph -names sample1 sample2 -g human_hg19.fa.fai -empty > samples1and2.txt

this way you will end up with a new almost bedgraph file with each sample being a column of that file, and the regions defined would be regions of same coverage among all the files you've processed. I use the -g human_hg19.fa.fai -empty options because I want to see also what regions do not have any coverage at all, but this would be optional for you.

score 0 · Answer 4 · 2012-06-10

0

Entering edit mode

12.5 years ago

Zev.Kronenberg 12k

You could also just use IGV and load up the two BAM files.

ADD COMMENT • link 12.5 years ago by Zev.Kronenberg 12k

0

Entering edit mode

I do this for close inspection of short regions, but for a chromosome level view this doesn't do the job or?

ADD REPLY • link 12.5 years ago by Rubal7 ▴ 850

1

Entering edit mode

at chromosome level, due to IGV's zoom restriction, you'll have to load a coverage file such as bedgraph or bigwig for instance, although the inspection depth would be quite shallow.

ADD REPLY • link 12.5 years ago by Jorge Amigo 14k

0

Entering edit mode

Yes, it is difficult to do at a chromosome level.

ADD REPLY • link 12.5 years ago by Zev.Kronenberg 12k

Ram · Answer 5 · 2019-04-27

Another option is sambamba depth.

sambamba depth window -q=25  -t 1  -w 1000 -c 0 \
./NA12878_wgs/NA12878/NA12878.gatk4-deduped.sort.bam \
./NA24385_wgs/NA24385/NA24385.gatk4-deduped.sort.bam

I'm asking to calculate avg coverage within consecutive windows of 1000bp in size(-w 1000), to include zero coverage regions (-c 0) and only count reads of qual >25 (-q 25) using 1 thread (-t 1).

This will give you one row per window size specified, and a row per BAM specified (you can add 'n' BAMs to this command).

chr1    9000    10000   8   0.013   NA24385 
chr1    9000    10000   6   0.009   NA12878
chr1    10000   11000   766 52.485  NA24385
chr1    10000   11000   495 34.148  NA12878