*// Edit to make the post more clear (Mapping done via Bowtie2). My problem is that when counting Exome Coverage via coverageBed gives different results than via genomeCoverageBed. So I'm not sure if I'm doing something wrong, or which of the 2 methods is correct.
1) My first step is to build an .bed file of my Illumina Paired-End reads, returning the positions that only fall in targeted exon regions. I'm doing that via intersectBed -a [data.bed] -b [illuminaexonregions.bed].
2) My next step is to calculate the coverage of my new datafile via coverageBed -a [newdata.bed] -b [illuminaexonregions.bed]. I calculated some statistics:
Number of exons 214126 with a total length of 45326818
Number of matched nucleotides 10993449.0
Nucleotides/Length*100 24.253740909 % Coverage.
3) The next step was to calculate the coverage of my new datafile via genomeCoverageBed -i [newdata.bed] -g [genome.txt] -d awk '$3>0 {print $1"\t"$2"\t"$3}'. I calculated some statistics:
Number of exons 214126 with a total length of 45326818
Number of matched nucleotides 10576907.0
Nucleotides/Length*100 23.3347661863 % Coverage.
Somehow there's a difference in matched nucleotides, which I can't explain. What am I doing wrong?
+1. CalculateHSMetrics is awesome (once you figure out the goofy BED-type format it demands). If you decide to use it and have any trouble, post a comment here and I'll help you out when I see the question.
Dan, I have trouble figuring out the format of interval files. Links lead to that info seem all dead. Please help. Thanks!
Picard has been essentially folded in to the GATK. The new documentation is here: https://broadinstitute.github.io/picard/command-line-overview.html