Hi,
I am trying to generate Single nucleotide mutations to a sequence, and trying to get SAM files and its related CIGAR sequences.
My test genome is: (ref.fa)
>genome
CGGACACACAAAAAGAAAGAAGAATTTTTAGGATCTTTTGTGTGCGAATA
GATGGGCCCATCGATTGCCATCCCGGGGAAAATCATCATCATCATCATCA
Then I created a fastq sequence (seq_1.fq) which I inserted a Nucleotide A, to the first line and ran the BWA alignment, then the SAM file
bwa mem ref.fa seq_1.fq > seq_1.sam
samtools sort -o seq_1.bam --output-fmt BAM seq_1.sam
samtools view seq_1.bam | sam2pairwise
which shows the insertion by giving the CIGAR seq as 29M1I21M:
genome 0 genome 1 60 29M1I21M * 0 0
CGGACACACAAAAAGAAAGAAGAATTTTTAAGGATCTTTTGTGTGCGAATA
||||||||||||||||||||||||||||| |||||||||||||||||||||
CGGACACACAAAAAGAAAGAAGAATTTTT-AGGATCTTTTGTGTGCGAATA
Question: But when I tried to replace a T using G from 'TTTTT' to 'TTGTT' (seq_2.fq) and ran the BWA align and samtools it does not give me the desired CIGAR sequence !!! BUT it just gives it as 50M
bwa mem ref.fa seq_2.fq > seq_2.sam
samtools sort -o seq_2.bam--output-fmt BAM seq_2.sam
samtools view seq_2.bam | sam2pairwise
genome 0 genome 1 60 50M * 0 0
CGGACACACAAAAAGAAAGAAGAATTGTTAGGATCTTTTGTGTGCGAATA
||||||||||||||||||||||||||||||||||||||||||||||||||
CGGACACACAAAAAGAAAGAAGAATTGTTAGGATCTTTTGTGTGCGAATA
as it does not recognize the mutation which was introduced to the sequence.
So am missing something here ? How can I generate a CIGAR sequence which states, that there is a mismatch or SNP in the sequence ?
Thank you very much.