Entering edit mode
3.8 years ago
skanterakis
▴
130
Does anyone know of an efficient way to calculate error-rate (% bases that mis-match to the reference) from a bam, in a subset of the genome defined by a bed file? By efficient I was hoping for a c/htslib based method. Thank you!
igvtools
will allow you to get a count of bases present at each site from a BAM file. You can subset your BAM before doing that.as far as I can see
igvtools count --bases
counts the occurrence of each base per region. Are you suggesting to then compare to the reference base with another command?Do you want to calculate the sequencing error rate, or naturally occurring polymorphisms (SNPs, indels, and so on)?
The sequencing error rate. I don't mind polymorphisms being included in the calculation though.