Samtools Error "CIGAR and Query lengths differ" but lengths are the same
0
0
Entering edit mode
2.3 years ago

Hi everyone, I am receiving the following error when I try to apply the samtools depth function:

samtools depth 81_S1.FINAL.bam
[E::bam_read1] CIGAR and query sequence lengths differ for M50243:17:000000000-JWNTH:1:1115:7439:19293

When I look at the indicated lines in the bam file, the sequence and query lengths appear to match:

M50243:17:000000000-JWNTH:1:1115:7439:19293 99  chrM.fa 31  57  58M120S =   31  120 CACGGGAGCTCTCCATGCATTTGGGATTTTCTTCTTGGGGGTATGCACTCGATAGCATCGCGGTGCGCGTTGCGCTGAGCACCCTATGTCGCAGTATTTGACTTTTTGGTGTGCATTGAAAGCCAGTCACGTCTCAATCTCGTATGTCGGCTTTTTCTTTGCAATAAAAAAATACTTA  ,,AFDCGEDGFG@FGF,89,;EBD,,,;;E;,4,;,;=,@C8,@@C;E,,99=@@=AC,=,+=+4+6@+@++,18+@69,EC AE+8=,,@+@8+,6,,8,,7E+,+++8:++,+3+>+,,,)@+6@8,,+3+,,,2,:=+891;55?++053*,;*3+*0+,*,+*,,7**,33,3,  NM:i:4  MD:Z:47T6G3G12G9    MC:Z:81S120M    AS:i:61 XS:i:36 RG:Z:81 SA:Z:chrM.fa,107,+,99S21M81S,12,0;  CO:Z:Set1_00031_00107-Set1_00031_00107

M50243:17:000000000-JWNTH:1:1115:7439:19293 147 chrM.fa 31  60  77M =   31  -120    CACGGGAGCTCTCCATGCATTTGGTATTTTCGTCTGGGGGGTATGCACGCGATAGCATCGCGGGACGCTGGAGCCGG   GGGGGGGGGGGGGGGGGGGGGEGGGGGGGEGFFEGGGGGFGGGGGGGGGGGGGGGFGGGGGGEGGGGGGGGGFDGGF   NM:i:2  MD:Z:81T3A34    MC:Z:81M120S    AS:i:110    XS:i:53 RG:Z:81 CO:Z:Set1_00031_00107-Set1_00031_00107

For the first line, the CIGAR string indicates a length of 178bp, and a character count of the provided sequence is also 178, while the second line CIGAR string indicates a length of 77bp, and the provided sequence is 77 characters. I realize that the two lines do not match each other, but I'm not sure if this is the issue - I believe I have the same ID twice because I am working with a paired end short read sequencing chemistry. If someone could help me figure out what is causing this error, it would be much appreciated.

Thanks in advance!

cigar samtools • 1.7k views
ADD COMMENT
0
Entering edit mode

what's the version of samtools ?

ADD REPLY
0
Entering edit mode

Hi, I have tried using version 1.13 and 1.15, both giving the same error.

ADD REPLY
0
Entering edit mode

The MD tags seem to be incorrect, perhaps that is what the error indicates,

For example in your second read, the CIGAR string is 77M but the MD tag is MD:Z:81T3A34

The MD tag indicates 81 matches followed by a mismatch T etc.

How was this BAM file generated?

(Edit: the MC tags are also incorrect)

ADD REPLY
0
Entering edit mode

I see, thanks for this. I am new to CIGAR strings, MD tags, etc., so this is good to know.

The BAM file was generated on our MiSeq FGx (previously made by Illumina, but now made by Verogen) using the default Verogen pipeline. I don't know the details of how Verogen's software generates BAM files, so unfortunately I'm not sure why the MD or MC tags would be incorrect. The MiSeq also generates fastq files that I know I can use to generate my own BAM files, but I am interested in the Verogen-generated BAM files specifically.

ADD REPLY
0
Entering edit mode

I would contact the support from Verogen for answers with this particular data.

ADD REPLY
0
Entering edit mode

Your first sequence seems to have an invalid QUAL character: ... 18+@69,EC AE+8=,,@+ ... That's either a space or a tab, but I can't tell here. I assume it's a space or some other non-printing char that biostars has rendered as a space. If you correct this does the error go away? Replacing it with comma I then find samtools accepts your data just fine.

Maybe your sequence line has a space in it and runs into the qual, which is why you get the error. Biostars doesn't preserve white space though (or you lost it during posting), so I'm not sure my file is the same as yours. Tabifying it correctly makes samtools happy though.

I don't think the MD/MC tag will cause this error to be generated.

ADD REPLY

Login before adding your answer.

Traffic: 1716 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6