Hi everyone, I am receiving the following error when I try to apply the samtools depth function:
samtools depth 81_S1.FINAL.bam
[E::bam_read1] CIGAR and query sequence lengths differ for M50243:17:000000000-JWNTH:1:1115:7439:19293
When I look at the indicated lines in the bam file, the sequence and query lengths appear to match:
M50243:17:000000000-JWNTH:1:1115:7439:19293 99 chrM.fa 31 57 58M120S = 31 120 CACGGGAGCTCTCCATGCATTTGGGATTTTCTTCTTGGGGGTATGCACTCGATAGCATCGCGGTGCGCGTTGCGCTGAGCACCCTATGTCGCAGTATTTGACTTTTTGGTGTGCATTGAAAGCCAGTCACGTCTCAATCTCGTATGTCGGCTTTTTCTTTGCAATAAAAAAATACTTA ,,AFDCGEDGFG@FGF,89,;EBD,,,;;E;,4,;,;=,@C8,@@C;E,,99=@@=AC,=,+=+4+6@+@++,18+@69,EC AE+8=,,@+@8+,6,,8,,7E+,+++8:++,+3+>+,,,)@+6@8,,+3+,,,2,:=+891;55?++053*,;*3+*0+,*,+*,,7**,33,3, NM:i:4 MD:Z:47T6G3G12G9 MC:Z:81S120M AS:i:61 XS:i:36 RG:Z:81 SA:Z:chrM.fa,107,+,99S21M81S,12,0; CO:Z:Set1_00031_00107-Set1_00031_00107
M50243:17:000000000-JWNTH:1:1115:7439:19293 147 chrM.fa 31 60 77M = 31 -120 CACGGGAGCTCTCCATGCATTTGGTATTTTCGTCTGGGGGGTATGCACGCGATAGCATCGCGGGACGCTGGAGCCGG GGGGGGGGGGGGGGGGGGGGGEGGGGGGGEGFFEGGGGGFGGGGGGGGGGGGGGGFGGGGGGEGGGGGGGGGFDGGF NM:i:2 MD:Z:81T3A34 MC:Z:81M120S AS:i:110 XS:i:53 RG:Z:81 CO:Z:Set1_00031_00107-Set1_00031_00107
For the first line, the CIGAR string indicates a length of 178bp, and a character count of the provided sequence is also 178, while the second line CIGAR string indicates a length of 77bp, and the provided sequence is 77 characters. I realize that the two lines do not match each other, but I'm not sure if this is the issue - I believe I have the same ID twice because I am working with a paired end short read sequencing chemistry. If someone could help me figure out what is causing this error, it would be much appreciated.
Thanks in advance!
what's the version of samtools ?
Hi, I have tried using version 1.13 and 1.15, both giving the same error.
The MD tags seem to be incorrect, perhaps that is what the error indicates,
For example in your second read, the CIGAR string is
77M
but the MD tag isMD:Z:81T3A34
The
MD
tag indicates81
matches followed by a mismatchT
etc.How was this BAM file generated?
(Edit: the MC tags are also incorrect)
I see, thanks for this. I am new to CIGAR strings, MD tags, etc., so this is good to know.
The BAM file was generated on our MiSeq FGx (previously made by Illumina, but now made by Verogen) using the default Verogen pipeline. I don't know the details of how Verogen's software generates BAM files, so unfortunately I'm not sure why the MD or MC tags would be incorrect. The MiSeq also generates fastq files that I know I can use to generate my own BAM files, but I am interested in the Verogen-generated BAM files specifically.
I would contact the support from Verogen for answers with this particular data.
Your first sequence seems to have an invalid QUAL character: ...
18+@69,EC AE+8=,,@+
... That's either a space or a tab, but I can't tell here. I assume it's a space or some other non-printing char that biostars has rendered as a space. If you correct this does the error go away? Replacing it with comma I then find samtools accepts your data just fine.Maybe your sequence line has a space in it and runs into the qual, which is why you get the error. Biostars doesn't preserve white space though (or you lost it during posting), so I'm not sure my file is the same as yours. Tabifying it correctly makes samtools happy though.
I don't think the MD/MC tag will cause this error to be generated.