How to read vcf file in python?
3
1
Entering edit mode
4.9 years ago
ja4123 ▴ 30

When I try to do simply like this:

import vcf
vcf_reader = vcf.Reader(filename="in.vcf.gz")

there is an error:

AttributeError: partially initialized module 'vcf' has no attribute 'Reader' (most likely due to a circular import)

But vcf module has that attribute .. Kindly help.

vcf reader python • 39k views
ADD COMMENT
2
Entering edit mode

also it sounds like your installation of pyvcf is messed up. I would consider trying the version in conda; https://anaconda.org/bioconda/pyvcf

ADD REPLY
1
Entering edit mode

I always read it by pandas (after removing the heads).

ADD REPLY
1
Entering edit mode

personally, I just use GATK VariantsToTable to convert it to a .tsv first. Its much easier to parse this way. Unless you wanted something from the header? Another option might to be convert to another tabular format such as .maf

ADD REPLY
4
Entering edit mode
4.9 years ago
onestop_data ▴ 330

Try Pysam . You can easily pip install it (pip install pysam)

ADD COMMENT
6
Entering edit mode
3.4 years ago
d.vitale199 ▴ 60

I like to use Pandas. I find the line that starts with '#CHROM', split that row to make a list of names for names=<list of names>, and read in chunks with comment='#'

import pandas as pd
import gzip

def get_vcf_names(vcf_path):
    with gzip.open(vcf_path, "rt") as ifile:
          for line in ifile:
            if line.startswith("#CHROM"):
                  vcf_names = [x for x in line.split('\t')]
                  break
    ifile.close()
    return vcf_names


names = get_vcf_names('file.vcf.gz')
vcf = pd.read_csv('file.vcf.gz', compression='gzip', comment='#', chunksize=10000, delim_whitespace=True, header=None, names=names)
ADD COMMENT
0
Entering edit mode

I have zip file instead of gzip so how can I change my code?

ADD REPLY
0
Entering edit mode

The line under if statement could be improved to:

vcf_names = line.strip('#\n').split('\t')
ADD REPLY
2
Entering edit mode
21 months ago
BCArg ▴ 90

I actually ind pyvcf useful to parse vcf files, it contains a lot of useful attributes.

Below is the code I use to get attributes from each entry in the vcf:

import vcf
vcf_fullPath = '/path/to/file.vcf'

records = vcf.Reader(open(vcf_fullPath, 'r'))

# records is an iterable, from which you can get attributes such as REF, ALT, POS etc.

for row in records:
    chr = row.CHROM
    pos = row.POS
    id = row.ID
    ref = row.REF
    alt = row.ALT

print(f"chr is {chr}, pos is {pos}, alternate allele is {alt}")
chr is 1, pos is 781258, alternate allele is [T]
ADD COMMENT

Login before adding your answer.

Traffic: 2743 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6