Mapping quality score of clipped reads
1
0
Entering edit mode
5.1 years ago
radwa.raed ▴ 40

Hi, I have a question please. For soft-clipped or hard-clipped reads, for example a read with a cigar string 100M50S, does the mapping quality score of the read reflect: 1. Mapping quality of both the 100 bases, as well as the 50 clipped bases? 2. Mapping quality of the 100 bases only?

If it is the combined quality of both the 100 bases + 50 bases, how do I extract the mapping quality of only the 100 matched bases in pysam? Thanks!

pysam python bam • 1.3k views
ADD COMMENT
0
Entering edit mode
5.1 years ago

The mapping score of the read as a whole has little to do with the quality of the individual letters. It's a measure of how likely it is that the read is aligned to its real position in the genome. It's more a measure of uniqueness of that sequence in your genome than anything else, though if there are many mismatches between the read and the reference that can also make the software question if it put the read in the right place.

ADD COMMENT
0
Entering edit mode

The uniqueness of the 100 nucleotides that matched, or the uniqueness of the 100 nucleotides that matched at position X PLUS the uniqueness of the 50 nucleotides that were soft-clipped and matched at position Y?

ADD REPLY
0
Entering edit mode

Since the 50 clipped bases are not part of the alignment, they obviously do not count. Why do you think that the 50 base match anywhere? If they were adapter, they would not belong to the reference genome at all.

ADD REPLY
0
Entering edit mode

Thank you! I know that the clipped bases align elsewhere because it is a chimeric read. Below is a supplementary alignment for one of my reads with a mapping quality score of 43.This should map to chr2: 29446797 in hg19

03012p9SX:030B 2064 1 29446797 43 36M241H -1 -1 36 CCCCACGTGAACGAGGGAGGGAGGGAGGGTTGGGTG array('B', [32, 37, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 37, 41, 41, 41, 41, 41, 41, 41, 41, 37, 41, 41]) [('SA', 'chr2,42479618,+,238M39S,60,2;'), ('MD', '26A9'), ('RG', 'SAMPLE.sample23'), ('NM', 1), ('AS', 31), ('XS', 23), ('fm', '3_0')]

Since the read is on the negative strand, below is the forward sequence: CACCCAACCCTCCCTCCCTCCCTCGTTCACGTGGGG

However this sequence does not map to chr2: 29446797. I tried blasting but I think it is too short so I aligned it to the genomic sequence as you can see here https://ibb.co/R7VHrQH This is not a good alignment at all. Is BWA wrong? I thought perhaps if the alignment score is a total of both the matched and clipped bases, it could explain the issue. Thank you very much!

ADD REPLY

Login before adding your answer.

Traffic: 2674 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6