Entering edit mode
2.1 years ago
pablosolar.r
▴
20
Hi all!
I am using PySam to calculate the insert reads and the mean base quality for the reads of a BAM as follows:
import math
import pysam
import numpy as np
import matplotlib.pyplot as plt
bam_file = pysam.AlignmentFile("mybam.bam", "rb")
mean_base_qualities = []
insert_sizes = []
zipped = []
for read in bam_file:
if read.is_paired and read.is_proper_pair:
# length
mean_bq = np.mean(read.query_qualities)
read_length = read.query_alignment_length
tlen = read.template_length
insert_size = int(math.sqrt(math.pow(tlen, 2))-read_length)
if insert_size > 0:
zipped.append((mean_bq, insert_size))
mean_base_qualities.append(mean_bq)
insert_sizes.append(insert_size)
plt.plot(mean_base_qualities,insert_sizes)
plt.show()
Finally, I've got two huge lists representing the insert sizes and the mean base quality for a BAM reads, like:
insert_sizes = [291, 188, 165, 212, 186, 127, 240, 166, 120, 303, 255, ...
mean_base_qualities = [36.24324324324324, 34.73831775700935, 35.3448275862069, 30.64788732394366, 36.467289719626166, 22.673267326732674, 36.54216867469879, 31.20754716981132, 25.085714285714285, 34.65625, ...
When I try to plot them I am getting the following graph but it is obviously wrong:
I also tried to sort the lists independently getting:
And also tried zipping the lists and different ways of plotting but with no succeed. Could you help me, please? What I am missing? Thank you in advance!
I know next to nothing about matplotlib, but I think you want
matplotlib.pyplot.scatter
.