samtools returns error - cigar and query sequence are of different length even though cigar and query sequence are the same length
1
0
Entering edit mode
3.3 years ago

I have written a program in python that processes and outputs mapped results as either a bam or sam file. It works fine until I have indels within the sequence. when I try to process the result file using samtools, it returns the following error:

samtools [e::sam_parse1] cigar and query sequence are of different length"

...even though the cigar and query sequence are of the same length (see below sample sam lines which returned the error).

mapped_read1    0   chr1    1379078 60  17M1D58M    =   1379078 0   CTATCAATTTTTTGTTTTTTGTTGTTGTTGTTTTTTAAGGATTGTGGGCCGGGTACAGTGACTCACGCCTGTAATC    AAAEEEEEE/EEAEE/EEEEEEEEEEEEEEEEEEEEEEE6<EAE/EEE<AEEEEEEAE/EEEEEEEEAEEEEEEEE    MD:Z:13T3^G58   XG:Z:1  NH:i:1  NM:Z:1  XM:Z:1  XN:Z:0  XO:Z:1  AS:Z:-13    YT:Z:UU
mapped_read2    99  chr1    1654630 60  34M2D24M    =   1654630 211 ATTCTACTTTATTTGTGCAAAATCTTTTTTTTCCTTTTTTTTTTTAGAGGCGGGGTCTTG    EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEE    MD:Z:34^T^T24   XG:Z:2  NH:i:1  NM:Z:0  XM:Z:0  XN:Z:0  XO:Z:1  AS:Z:-11    YT:Z:0
mapped_read3    0   chr1    1684108 60  24M1D43M    =   1684108 0   CACTCAGCAGTCGTGAGACGCTTGCCCAATTGCCTTTTCTTTTAAAGGCCCCTGCCATTTCCCAAACC    AAAEEEEEEEEEEEEAEEEEEEEEEEEEEEEEE<EEEEEEEEEEEEEEEE<EEEEEEEEEAEEEEEEE    MD:Z:24^C37A5   XG:Z:1  NH:i:1  NM:Z:1  XM:Z:1  XN:Z:0  XO:Z:1  AS:Z:-11    YT:Z:UU
mapped_read 0   chr1    1889614 60  47M1D72M    =   1889614 0   ACCAGCCTGGTTAACAGGATCTCTTTTACAAAGAGATCCCATCTCTTAAAAAAAAAAAAAAAAAAGGCTTCCATGAAGATGAATTAAGCAAACAAAAGCTGATACTGCTTCTGCATGCTA    AAAAEEE/EEEEEEEAEEEEEEEEAEEEEEEE/EEAEEEAEEEAEEEEEEEEEEEEEEEEEEEE/EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEE    MD:Z:47^A72 XG:Z:1  NH:i:1  NM:Z:0  XM:Z:0  XN:Z:0  XO:Z:1  AS:Z:-8 YT:Z:UU
mapped_read4    0   chr1    2286565 60  58M1D10M    =   2286565 0   GATTTGAACCCCAAATATAATAATTAAAGTGAGTGTGCATTTTATTCTTACCACATCAGGCAAACGCTG   EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE/EEEEE   MD:Z:31C26^G10  XG:Z:1  NH:i:1  NM:Z:1  XM:Z:1  XN:Z:0  XO:Z:1  AS:Z:-13    YT:Z:UU
mapped_read5    0   chr1    2946932 60  9M1D59M =   2946932 0   ACAAAACCTGGCCCACCCTGGTGAGGGGAGTTCTAGCCTCTCCCAGGCCCCCATGGCACGATCTAAGTG   AAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE   MD:Z:9^G25C33   XG:Z:1  NH:i:1  NM:Z:1  XM:Z:1  XN:Z:0  XO:Z:1  AS:Z:-13    YT:Z:UU
mapped_read6    0   chr1    4479047 60  42M2D57M    =   4479047 0   GTGTAAAAGTGTTCCTTTTCACCACATCCATGCCAACGTCTATATTTTTTTTTTGGATTTTATATTAATTGGCCATTCTTGCAGGAGTAAGGTGGTATCTC   AAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE//EEEEEEEEEEAAEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEE   MD:Z:42^T^T57   XG:Z:2  NH:i:1  NM:Z:0  XM:Z:0  XN:Z:0  XO:Z:1  AS:Z:-11    YT:Z:UU

This result bam/sam file was generated using the pysam module in python. Any help would be appreciated. I have tried using samtools 0.1.18, 1.2, 1.6, 1.13, all with the same error.

sam samtools bam • 2.0k views
ADD COMMENT
1
Entering edit mode
3.3 years ago

The D CIGAR operator represents bases that exist in the reference but not the read: they have been deleted in the read. So the bases counted by D CIGAR operations do not correspond to bases in the SEQ field of the read. (This is what the table in §1.4 of the SAM specification means by “no” in the “Consumes query” column.)

For mapped_read1 for example, the SEQ field contains 76 bases, but the CIGAR string 17M1D58M represents a read with only 17+58 = 75 bases, hence the error.

You don't say what sort of adjustments your Python program is making to the reads, but as a general observation you probably need to look more carefully at what the CIGAR string represents and how you need to adjust it correspondingly.

ADD COMMENT
0
Entering edit mode

Thanks. cigar strings with deletions are clipped by 1. I am fixing it now.

ADD REPLY

Login before adding your answer.

Traffic: 2264 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6