Question

Mapping quality score of clipped reads

0

Entering edit mode

5.2 years ago

radwa.raed ▴ 40

Hi, I have a question please. For soft-clipped or hard-clipped reads, for example a read with a cigar string 100M50S, does the mapping quality score of the read reflect: 1. Mapping quality of both the 100 bases, as well as the 50 clipped bases? 2. Mapping quality of the 100 bases only?

If it is the combined quality of both the 100 bases + 50 bases, how do I extract the mapping quality of only the 100 matched bases in pysam? Thanks!

pysam python bam • 1.4k views

ADD COMMENT • link updated 5.2 years ago by swbarnes2 14k • written 5.2 years ago by radwa.raed ▴ 40

score 0 · Answer 1 · 2019-10-29

0

Entering edit mode

5.2 years ago

swbarnes2 14k

The mapping score of the read as a whole has little to do with the quality of the individual letters. It's a measure of how likely it is that the read is aligned to its real position in the genome. It's more a measure of uniqueness of that sequence in your genome than anything else, though if there are many mismatches between the read and the reference that can also make the software question if it put the read in the right place.

ADD COMMENT • link 5.2 years ago by swbarnes2 14k

0

Entering edit mode

The uniqueness of the 100 nucleotides that matched, or the uniqueness of the 100 nucleotides that matched at position X PLUS the uniqueness of the 50 nucleotides that were soft-clipped and matched at position Y?

ADD REPLY • link 5.2 years ago by radwa.raed ▴ 40

0

Entering edit mode

Since the 50 clipped bases are not part of the alignment, they obviously do not count. Why do you think that the 50 base match anywhere? If they were adapter, they would not belong to the reference genome at all.

ADD REPLY • link 5.2 years ago by swbarnes2 14k

0

Entering edit mode

Thank you! I know that the clipped bases align elsewhere because it is a chimeric read. Below is a supplementary alignment for one of my reads with a mapping quality score of 43.This should map to chr2: 29446797 in hg19

03012p9SX:030B 2064 1 29446797 43 36M241H -1 -1 36 CCCCACGTGAACGAGGGAGGGAGGGAGGGTTGGGTG array('B', [32, 37, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 37, 41, 41, 41, 41, 41, 41, 41, 41, 37, 41, 41]) [('SA', 'chr2,42479618,+,238M39S,60,2;'), ('MD', '26A9'), ('RG', 'SAMPLE.sample23'), ('NM', 1), ('AS', 31), ('XS', 23), ('fm', '3_0')]

Since the read is on the negative strand, below is the forward sequence: CACCCAACCCTCCCTCCCTCCCTCGTTCACGTGGGG

However this sequence does not map to chr2: 29446797. I tried blasting but I think it is too short so I aligned it to the genomic sequence as you can see here https://ibb.co/R7VHrQH This is not a good alignment at all. Is BWA wrong? I thought perhaps if the alignment score is a total of both the matched and clipped bases, it could explain the issue. Thank you very much!

ADD REPLY • link 5.2 years ago by radwa.raed ▴ 40