Counting substitutions/indels in SAM per reference nucleotide
0
0
Entering edit mode
6.2 years ago

Hello everyone!

I want to check which regions of interesting me gene are more prone to errors/more divergent. For this I've decided to count percentage of various types of errors per nucleotide of the reference gene. I know this information is stored in SAM file, but I am having trouble with finding software which would allow me to extract this information, e.g. prepare table containing number or percentage of substitutions and indels per position in the reference.

For example in case of such mapping:

positions 123456
reference ACTCTG

read1     ACTCTG
read2     ACT-TG
read3     ACTCTC

I want to produce table more or less like this:

position [nucleotide] correct substitutions deletions
1 A 3 0 0
2 C 3 0 0
3 T 3 0 0
4 C 2 0 1
5 T 3 0 0
6 G 2 1 0

I can try and write my own script to count errors, however I am afraid that it would be reinventing the wheel. Does anyone know if there is software doing what I am trying to do?

Thanks!

SNP • 1.6k views
ADD COMMENT
0
Entering edit mode

did you try samtools mpileup ?

ADD REPLY
0
Entering edit mode

That's the closest thing I have found, I guess I will just use it.

ADD REPLY

Login before adding your answer.

Traffic: 1953 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6