Hello,
I have 3 bam files that I wanted to compare against each other. For example I have reference file with 10,000 sequences. I have paired end reads sequenced for 3 different samples.
1) Sample 1 is 100% same as reference so we expect all reads to map to it 2) Sample 2 is 80% similar to reference so 20% of reference sequences wont have any reads 3) Sample 3 is 60% similar to reference and 40% of reference wont have any reads.
Now my goal is to identify what reference sequences doesnot have any reads mapped in Sample 2 and 3.I need to identify the 20% reference sequences from Sample 2 and 40% from Sample 3.
Also in some cases in a reference which is approx 10kb long, sample 1 maps to entire 10kb, sample 2 maps to first 5kb and sample 3 maps to last 3kb. so I need to identify the partial regions for those reference sequences as well.
I have the mapped sorted bam files for all these three samples. I am looking in to using bedtools but not sure what in bedtools will give the answer I needed.
I have the following commands which might do similar but it outputs differences at every base.
genomeCoverageBed -bg -ibam sample1.bam > sample1.bedgraph
genomeCoverageBed -bg -ibam sample2.bam > sample2.bedgraph
unionBedGraphs -header -i sample1.bedgraph sample2.bedgraph -names sample1 sample2 -g reference.fai -empty > samples1and2.txt
The title of your question is incorrect, you are not comparing two bam files. You are trying to find which genomic regions do not have read coverages in various samples. (btw having an incorrect title makes it less likely to get a good answer)
Any suggestions on my question?