Question

How to read vcf file in python?

1

Entering edit mode

5.4 years ago

ja4123 ▴ 30

When I try to do simply like this:

import vcf
vcf_reader = vcf.Reader(filename="in.vcf.gz")

there is an error:

AttributeError: partially initialized module 'vcf' has no attribute 'Reader' (most likely due to a circular import)

But vcf module has that attribute .. Kindly help.

vcf reader python • 42k views

ADD COMMENT • link updated 2.3 years ago by BCArg ▴ 90 • written 5.4 years ago by ja4123 ▴ 30

2

Entering edit mode

also it sounds like your installation of pyvcf is messed up. I would consider trying the version in conda; https://anaconda.org/bioconda/pyvcf

ADD REPLY • link 5.4 years ago by steve ★ 3.5k

1

Entering edit mode

I always read it by pandas (after removing the heads).

ADD REPLY • link 5.4 years ago by shoujun.gu ▴ 350

1

Entering edit mode

personally, I just use GATK VariantsToTable to convert it to a .tsv first. Its much easier to parse this way. Unless you wanted something from the header? Another option might to be convert to another tabular format such as .maf

ADD REPLY • link 5.4 years ago by steve ★ 3.5k

6

Entering edit mode

3.9 years ago

d.vitale199 ▴ 60

I like to use Pandas. I find the line that starts with '#CHROM', split that row to make a list of names for names=<list of names>, and read in chunks with comment='#'

import pandas as pd
import gzip

def get_vcf_names(vcf_path):
    with gzip.open(vcf_path, "rt") as ifile:
          for line in ifile:
            if line.startswith("#CHROM"):
                  vcf_names = [x for x in line.split('\t')]
                  break
    ifile.close()
    return vcf_names


names = get_vcf_names('file.vcf.gz')
vcf = pd.read_csv('file.vcf.gz', compression='gzip', comment='#', chunksize=10000, delim_whitespace=True, header=None, names=names)

ADD COMMENT • link 3.9 years ago by d.vitale199 ▴ 60

0

Entering edit mode

I have zip file instead of gzip so how can I change my code?

ADD REPLY • link 3.4 years ago by anasjamshed ▴ 140

0

Entering edit mode

The line under if statement could be improved to:

vcf_names = line.strip('#\n').split('\t')

ADD REPLY • link 2.3 years ago by vladimir.yu.kiselev ▴ 30

2

Entering edit mode

2.3 years ago

BCArg ▴ 90

I actually ind pyvcf useful to parse vcf files, it contains a lot of useful attributes.

Below is the code I use to get attributes from each entry in the vcf:

import vcf
vcf_fullPath = '/path/to/file.vcf'

records = vcf.Reader(open(vcf_fullPath, 'r'))

# records is an iterable, from which you can get attributes such as REF, ALT, POS etc.

for row in records:
    chr = row.CHROM
    pos = row.POS
    id = row.ID
    ref = row.REF
    alt = row.ALT

print(f"chr is {chr}, pos is {pos}, alternate allele is {alt}")
chr is 1, pos is 781258, alternate allele is [T]

ADD COMMENT • link 2.3 years ago by BCArg ▴ 90

score 4 · Accepted Answer · 2020-01-19

4

Entering edit mode

5.4 years ago

onestop_data ▴ 330

Try Pysam . You can easily pip install it (pip install pysam)

ADD COMMENT • link 5.4 years ago by onestop_data ▴ 330