Question

Error-rate of a subset of the genome

0

Entering edit mode

3.8 years ago

skanterakis ▴ 130

Does anyone know of an efficient way to calculate error-rate (% bases that mis-match to the reference) from a bam, in a subset of the genome defined by a bed file? By efficient I was hoping for a c/htslib based method. Thank you!

WGS error bam bed • 990 views

ADD COMMENT • link 3.8 years ago by skanterakis ▴ 130

0

Entering edit mode

igvtools will allow you to get a count of bases present at each site from a BAM file. You can subset your BAM before doing that.

ADD REPLY • link 3.8 years ago by GenoMax 147k

0

Entering edit mode

as far as I can see igvtools count --bases counts the occurrence of each base per region. Are you suggesting to then compare to the reference base with another command?

ADD REPLY • link 3.8 years ago by skanterakis ▴ 130

0

Entering edit mode

Do you want to calculate the sequencing error rate, or naturally occurring polymorphisms (SNPs, indels, and so on)?

ADD REPLY • link 3.8 years ago by h.mon 35k

0

Entering edit mode

The sequencing error rate. I don't mind polymorphisms being included in the calculation though.

ADD REPLY • link 3.8 years ago by skanterakis ▴ 130

score 0 · Answer 1 · 2021-01-28

0

Entering edit mode

3.8 years ago

skanterakis ▴ 130

looks like the 5th column of samtools mpileup -f hg38.fa the.bam could be used or this: https://github.com/genome/bam-readcount

ADD COMMENT • link 3.8 years ago by skanterakis ▴ 130