Hello everyone!
I want to check which regions of interesting me gene are more prone to errors/more divergent. For this I've decided to count percentage of various types of errors per nucleotide of the reference gene. I know this information is stored in SAM file, but I am having trouble with finding software which would allow me to extract this information, e.g. prepare table containing number or percentage of substitutions and indels per position in the reference.
For example in case of such mapping:
positions 123456
reference ACTCTG
read1 ACTCTG
read2 ACT-TG
read3 ACTCTC
I want to produce table more or less like this:
position [nucleotide] correct substitutions deletions
1 A 3 0 0
2 C 3 0 0
3 T 3 0 0
4 C 2 0 1
5 T 3 0 0
6 G 2 1 0
I can try and write my own script to count errors, however I am afraid that it would be reinventing the wheel. Does anyone know if there is software doing what I am trying to do?
Thanks!
did you try
samtools mpileup
?That's the closest thing I have found, I guess I will just use it.