Cross-posted on Bioinformatics.SE. I hope that's OK. Answer was provided there. Short version: pysam pileup()
just wraps samtools mpileup
, and mpileup
by default filters out reads not just with a base quality—but also with a "base alignment quality"—lower than 13.
I thought that iff a read matches the reference at a position, it would appear in the pileup column for that position. But tracking a single read in pysam shows that that's not the case.
This read appears in some pileup columns for positions to which it does not map, and does not appear in some pileup columns to which it does map.
What's wrong in my understanding?
#open BAM
bamPath = "/path/to/example.bam"
bamData = pysam.AlignmentFile(bamPath, "rb")
#position we want to find reads for
chrom = "5"
pos = 345515
windowStart = 340001
windowEnd = 350001
#read to look out for
idForReadOfInterest = "E00489:44:HNNVYCCXX:1:2120:26524:19135"
def idForRead(read):
return read.tostring().split("\t")[0]
#extract the reads with pileup
columnRPs = []
readOfInterest = None
for column in bamData.pileup(chrom, windowStart, windowEnd):
for pRead in column.pileups:
read = pRead.alignment
if idForRead(read) == idForReadOfInterest:
readOfInterest = read
columnRPs.append(column.reference_pos)
print("read appears in pileup columns for these {} positions".format(len(columnRPs)))
print(columnRPs)
alignedPairs = readOfInterest.get_aligned_pairs()
alignedPositions = [x[1] for x in alignedPairs]
#filter to take only positions where read actually matches ref
matchPositions = []
for block in readOfInterest.get_blocks():
for p in alignedPositions:
if p >= block[0] and p < block[1]: #inclusive, exclusive?
matchPositions.append(p)
print("read matches at these {} positions".format(len(matchPositions)))
print(matchPositions)
read appears in pileup columns for these 167 positions [345247, 345248, 345249, 345250, 345251, 345252, 345253, 345254, 345255, 345256, 345257, 345258, 345259, 345260, 345261, 345262, 345263, 345264, 345265, 345266, 345267, 345268, 345269, 345270, 345271, 345272, 345273, 345274, 345275, 345276, 345277, 345278, 345279, 345280, 345281, 345282, 345283, 345284, 345285, 345286, 345287, 345288, 345289, 345290, 345292, 345294, 345445, 345446, 345447, 345448, 345449, 345451, 345452, 345453, 345454, 345455, 345456, 345457, 345458, 345459, 345460, 345462, 345463, 345464, 345465, 345466, 345469, 345470, 345473, 345474, 345475, 345476, 345478, 345481, 345482, 345485, 345487, 345490, 345492, 345495, 345497, 345498, 345499, 345500, 345501, 345502, 345503, 345504, 345505, 345506, 345507, 345508, 345509, 345510, 345511, 345513, 345518, 345523, 345524, 345525, 345527, 345528, 345529, 345530, 345532, 345533, 345534, 345535, 345536, 345537, 345538, 345539, 345540, 345542, 345544, 345545, 345546, 345547, 345548, 345549, 345550, 345551, 345552, 345553, 345554, 345555, 345556, 345557, 345558, 345559, 345560, 345561, 345562, 345563, 345564, 345565, 345566, 345567, 345568, 345569, 345570, 345571, 345572, 345573, 345574, 345575, 345576, 345577, 345578, 345579, 345580, 345581, 345582, 345583, 345584, 345585, 345586, 345587, 345588, 345589, 345590, 345591, 345592, 345593, 345594, 345595, 345596]
read matches at these 150 positions [345445, 345446, 345447, 345448, 345449, 345450, 345453, 345454, 345455, 345456, 345457, 345458, 345459, 345460, 345461, 345462, 345463, 345464, 345465, 345466, 345467, 345468, 345469, 345470, 345471, 345472, 345473, 345474, 345475, 345476, 345477, 345478, 345479, 345480, 345481, 345482, 345483, 345484, 345485, 345486, 345487, 345488, 345489, 345490, 345491, 345492, 345493, 345494, 345495, 345496, 345497, 345498, 345499, 345500, 345501, 345502, 345503, 345504, 345505, 345506, 345507, 345508, 345509, 345510, 345511, 345512, 345513, 345514, 345515, 345516, 345517, 345518, 345519, 345520, 345521, 345522, 345523, 345524, 345525, 345526, 345527, 345528, 345529, 345530, 345531, 345532, 345533, 345534, 345535, 345536, 345537, 345538, 345539, 345540, 345541, 345542, 345543, 345544, 345545, 345546, 345547, 345548, 345549, 345550, 345551, 345552, 345553, 345554, 345555, 345556, 345557, 345558, 345559, 345560, 345561, 345562, 345563, 345564, 345565, 345566, 345567, 345568, 345569, 345570, 345571, 345572, 345573, 345574, 345575, 345576, 345577, 345578, 345579, 345580, 345581, 345582, 345583, 345584, 345585, 345586, 345587, 345588, 345589, 345590, 345591, 345592, 345593, 345594, 345595, 345596]