Hi all,
I am trying to make plots of differences in coverage between two bam files. Each bam file has reads from individuals of a population. So bamA has individuals from population A and bamB has individuals from population B. Currently the simplest way I can see to do this is to use Bedtools. I am trying the following:
coverageBed -abam PopulationA.bam -b Chr20PopulationA.bed -hist
where Chr20PopulationA.bed is a bed file I made of 200bp windows of Chr20. The idea here is to calculate coverage by windows for PopulationA then do the same for PopulationB, plot them together and visually see outliers. However my bed file, which looks like this:
chr20 1 201
chr20 201 401
chr20 401 601
chr20 601 801
chr20 801 1001
chr20 1001 1201
etc...
gets the error: Unexpected file format. Please use tab-delimited BED, GFF, or VCF. Perhaps you have non-integer starts or ends at line 1? I have no idea why this is, so any help understanding this error and how to fix it is appreciated.
Perhaps more importantly I think there are probably better ways to calculate and plot differences in coverage between two bam files, so if anybody could share their experience or expertise with this that would be really appreciated.
Thanks for your help!
I'm also looking into samtools pileup for each file, then plotting the raw depth scores of each popn against the other
If it were me, I would run "samtools depth popA.bam popB.bam" and plot the difference. To identify outliers, simply compute the mean and identify loci outside certain std.dev. This is like CGH. Of course, you can also use bedtools. Note that working with differences helps to reduce bias and is thus better than working with two samples separately and then combining them.
Thanks for this comment, didn't realise something so straightforward existed, I'm running it now.