Hi,
I have Illumina reads and a reference genome. I aligned reads to the genome and I have .bam and .sam files. I want the list of the mismatches of the reads with the reference genome but if one mismatch happened in multiple reads, I want it to be reported only once. For example if two reads have the mismatch of A->G on position 1000 of the reference genome, I want something like this: G,1000,2
I want the output file to be something like this:
Mismatch,Position,Frequency
G,1000,2
A,3278,7
G,78732,5
T,89783,8
C,87494,8
A,13732,5
...
Is there any tool available for doing that?
Hi Pierre Lindenbaum,
I read all your documentation and all other posts on biostars! sam2tsv's a nice tool and serves my need. However, I have trouble understanding what each alphabet stands for in Read-Qual column and one important question is is that what does 'M' exactly mean? Match or a mismatch? and if it represents Mismatches, then the first line in the output has T's in both ref and alt. Shouldn't it be a match? I apologize if my questions are too naive. I thinking I'm missing out on a detail here. I would greatly appreciate your help on this.
Thanks in advance!
You should use
ADD COMMENT/ADD REPLY
to add this comment under @Pierre's answer.