Entering edit mode
9.4 years ago
Andrew
▴
60
I want to find the overall expected error rate of reads in a BAM file, as well as the expected number of errors.
Currently for each base in a read, I find the ASCII value (number) of the quality (column 11), subtract 33, and remove the phred scale, which to my understanding would give me the probability that the base is wrong. I do this for every base in every read and sum the values up, and divide it by the total number of bases for all reads.
Is this correct? If not how would I calculate the expected error rate?
That's correct, although note that a bam file can have multiple alignments for each read.
Hi Andrew, If it possible that you supply the command line that you use for this?
Also, other option to estimated per base error rate could be using mpileup file from the BAM file and then access to the data using command line tools like grep, cut, sed, awk, etc.
Thanks!