Plotting insert size vs base quality means
0
0
Entering edit mode
2.1 years ago
pablosolar.r ▴ 20

Hi all!

I am using PySam to calculate the insert reads and the mean base quality for the reads of a BAM as follows:

import math
import pysam
import numpy as np
import matplotlib.pyplot as plt

bam_file = pysam.AlignmentFile("mybam.bam", "rb")

mean_base_qualities = []
insert_sizes = []
zipped = []
for read in bam_file:
    if read.is_paired and read.is_proper_pair:
        # length
        mean_bq = np.mean(read.query_qualities)
        read_length = read.query_alignment_length
        tlen = read.template_length
        insert_size = int(math.sqrt(math.pow(tlen, 2))-read_length)
        if insert_size > 0:
            zipped.append((mean_bq, insert_size))
            mean_base_qualities.append(mean_bq)
            insert_sizes.append(insert_size)


plt.plot(mean_base_qualities,insert_sizes)
plt.show()

Finally, I've got two huge lists representing the insert sizes and the mean base quality for a BAM reads, like:

insert_sizes = [291, 188, 165, 212, 186, 127, 240, 166, 120, 303, 255, ...

mean_base_qualities = [36.24324324324324, 34.73831775700935, 35.3448275862069, 30.64788732394366, 36.467289719626166, 22.673267326732674, 36.54216867469879, 31.20754716981132, 25.085714285714285, 34.65625,  ...

When I try to plot them I am getting the following graph but it is obviously wrong:

enter image description here

I also tried to sort the lists independently getting:

enter image description here

And also tried zipping the lists and different ways of plotting but with no succeed. Could you help me, please? What I am missing? Thank you in advance!

samtools matplotlib python • 730 views
ADD COMMENT
0
Entering edit mode

I know next to nothing about matplotlib, but I think you want matplotlib.pyplot.scatter.

ADD REPLY

Login before adding your answer.

Traffic: 2413 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6