memory leak using pysam fetch
0
0
Entering edit mode
10.0 years ago
always_learning ★ 1.1k

Hi all,

I am using PYSAM module for one of my scripts where I am working on pretty large VCF files but job is not completing everytime and showing memory issue. I tried to run this with large and faster machine though. Did any one face similar issue with pysam earlier too with large files ?

This is my python script:

import sys
import os
import pysam
freq_dir_file=sys.argv[1]
vcf_dir_file = sys.argv[2]
snp_pos=[]
os.environ['vcf_file'] = vcf_dir_file
os.system("zcat $vcf_file | head -5000 | parallel --pipe grep '^#'")
data = open(freq_dir_file)
for line in data:
        if not line.startswith("CHROM") and not line.strip().split("\t")[0] == "NA":
                col = line.strip().split("\t")[4:]
                for i in col:
                        val = i.strip().split(":")[1]
                        num = float(val)
                #Comment this if its for Low frequency variants
                #if num > 0.005 and num < 0.050:
                #Comment this if its for coding region
                        if num < 0.005 and not num == 0:
                                check = 1
                        else:
                                pass
        if check == 1:
                chmpos = line.strip().split("\t")[0] +" "+ line.strip().split("\t")[1]
                snp_pos.append(chmpos)

tabixfile = pysam.Tabixfile(vcf_dir_file)
for i in snp_pos:
        (chrom, snp) = i.split(" ")[0], i.split(" ")[1]
        val = int(snp)-1
        for vcf in tabixfile.fetch(str(chrom), val, int(snp)):
                print vcf
python pysam VCF • 3.0k views
ADD COMMENT
0
Entering edit mode

Are you sure this is a memory leak in pysam? Python itself isn't exactly the best with memory management, so if freq_dir_file is large then I could see snp_pos blowing up the available memory. Having said that, I've never looked at the underlying tabix C code, so perhaps there's an issue there.

ADD REPLY
0
Entering edit mode

Since I am working on 32 GIGS RAM then chances of blowing up whole system memory with snp_pos is highly unlikely.

ADD REPLY
0
Entering edit mode

Anyone?

ADD REPLY
0
Entering edit mode

Do you know at which line in the code the Memory leak occurs? And how many items do you expect in snp_pos? Oh and can you give us the log of the error.

ADD REPLY
0
Entering edit mode

Try filing an issue.

ADD REPLY

Login before adding your answer.

Traffic: 1043 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6