What Does This Cigar Mean?
4
2
Entering edit mode
13.4 years ago
Gangcai ▴ 230

Dear All, Following are part of the output of tophat(v1.3.0):

DBV2SVN1_64_7_2105_14622_191104_0_2     256     I       4589    3       50M     *       0       0   AGTGTTAGTGTTGTTGTCTGTTGCTTGGTCAACATCATCTTGTCGTTTTT       HFHFHFHFHFHHHHHFHHFHHHHHHHHHHHHHHHHHHHHHHGHHHHHHHE   NM:i:0  NH:i:2  CC:Z:II CP:i:4527909    HI:i:0
DBV2SVN1_64_7_2105_9755_2866_0_2        272     I       4600    1       3421396b        *       0   0GBGTGM=GGMGMRYMD==      dqj�Oc!ijd!�9I��i�

Does anybody known the meaning of CIGAR "3421396b"? Thanks.

samtools tophat cigar • 5.8k views
ADD COMMENT
0
Entering edit mode

Looks to me like corrupted or truncated file. Are you sure it's correct? I.e. does it continue with more reads with sensible content?

ADD REPLY
5
Entering edit mode
13.4 years ago

It's a representation of positional alignment pattern of one sequence against another: the match/mismatch [M], deletion [D] and insertion [I] patterns of an alignment, as well as upstream and downstream padding, an other possibilities. See a table here:

[?]

A few simple examples: [?] Reference AAAAAAAAAA Query AAAAAAAAAA CIGAR MMMMMMMMMM or 10M

Reference AAAAAAAAAAT Query AAAAA-AAAAT CIGAR MMMMMDMMMMM or 5M1D5M

Reference AAAAA-AAAAT Query AAAAAAAAAAT CIGAR MMMMMIMMMMM or 5M1I5M [?]

(not sure what you G and E are in the example)

ADD COMMENT
1
Entering edit mode

As far as I know your cigar operations should have a 1 in front of the D and I. It makes parsing much easier.

ADD REPLY
0
Entering edit mode

...and it's what samtools does, which also makes it a good idea. And the spec says: *|([0-9]+[MIDNSHPX=])+, so the number appears to be mandatory.

ADD REPLY
3
Entering edit mode
13.4 years ago

From the SAM v1.4 spec:

Bit 0x4 is the only reliable place to tell whether the fragment is unmapped. If 0x4 is set, no assumptions can be made about RNAME, POS, CIGAR, MAPQ, bits 0x2, 0x10 and 0x100 and the bit 0x20 of the next fragment in the template.

In other words, the don't even look at the CIGAR unless bit 0x4 is not set. If I read your alignments above correctly (a little tough with a mixture of white space and tabs), both of the reads you report are unmapped so the CIGAR strings are irrelevant.

ADD COMMENT
0
Entering edit mode

Thanks for fixing the output in the original question. Brentp is right that there is a problem with that SAM line.

ADD REPLY
0
Entering edit mode

yeah, I wrapped the output in code tags so it would render more nicely.

ADD REPLY
0
Entering edit mode

Thanks again for making the original more easy to read.

ADD REPLY
0
Entering edit mode

Spec, schmec. BWA apparently reports mapq for unmapped reads (not sure about cigar) and this breaks Picard, which apparently does make assumptions for MAPQ.

http://seqanswers.com/forums/archive/index.php/t-4246.html

ADD REPLY
0
Entering edit mode

Not for all mapped reads. bwa concatenates the reference multi-fasta before aligning, and if the case of a read that hangs off the edge of one reference contig onto another, well, it would be a little misleading for the aligner to say that it has no idea where that read goes, so it gives a map position, and a mapq, and sets the unmapped flag too, so you can see clearly that something isn't quite right. It's a feature, not a bug (meant mostly seriously)

ADD REPLY
2
Entering edit mode
13.4 years ago
brentp 24k

That is bad output. e.g.: what does "0GBGTGM=GGMGMRYMD==" in the sequence column mean?

Plus that cigar string in nonsensical.

I would check that read in your sequence files and make sure it looks normal. Then try rerunning, if you can reproduce this with just a few reads--including that one, I would contact the tophat developers.

ADD COMMENT
0
Entering edit mode

I have checked the original reads, which has much shorter length than normal ones(18nt). @DBV2SVN1_64_7_2105_9755_2866_0_2 ACCGCGCAACAGACATCA + ggggggggggfgggeggg

Could this be the reason that tophat output such strange output?

ADD REPLY
0
Entering edit mode

probably, doesn't tophat try to split at 25bp?

ADD REPLY
0
Entering edit mode

If tophat produces that, regardless of input, I'd say it's a bug in tophat.

ADD REPLY
0
Entering edit mode
13.3 years ago
Marvin ▴ 900

Are you sure this is output from tophat? That second SAM line is obviously broken; I guess it's what samtools produces from a mangled BAM file.

Here's a wild guess: you ran samtools to make a BAM file, but then your hard disk became full. So you deleted a few files and everything seemed fine. Unfortunately, samtools has no error handling to speak of, so your BAM file is now missing data in the middle, and decoding it produces very weird effects.

ADD COMMENT

Login before adding your answer.

Traffic: 1291 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6