How to extract the GT from a given sample in a multisample VCF with cyvcf2?
1
0
Entering edit mode
2.0 years ago
LGMgeo ▴ 110

Here is my VCF multisamples:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO                                           FORMAT   sample1   sample2 

chr1           801950  .       T    <DEL>   2235      PASS   SVTYPE=DEL;END=838297      GT:GQ:FT       0/0:.:. 0/0:.:.

By using cyvcf2, how can I get GT(sample1) or GT(sample2)?

I tried something like that:

from cyvcf2 import VCF

vcf = VCF(vcfFile)

for variant in vcf:

      vcf.sample

      variant.gt_types

Thank you for your help

cyvcf2 • 1.9k views
ADD COMMENT
0
Entering edit mode

vcf.samples ? (I have not tried it myself, therefore I leave this as a comment).

ADD REPLY
0
Entering edit mode

it's useful to get a list of all sample names ;o)

ADD REPLY
0
Entering edit mode

Get all the genotypes with variant.genotypes. It will return a list containing the allele and phasing information for each sample. Access the sample-specific data with the respective index.

Ref: https://brentp.github.io/cyvcf2/docstrings.html#cyvcf2.cyvcf2.Variant.genotypes

ADD REPLY
0
Entering edit mode
2.0 years ago
Dave Carlson ★ 1.9k

You can access a sample-specific GT based on the index of the sample. For example, to print all the GTs for "sample1":

from cyvcf2 import VCF

vcf = VCF(vcfFile)

for variant in vcf:
    print(variant.gt_types[0])

Likewise, for "sample2":

from cyvcf2 import VCF

vcf = VCF(vcfFile)

for variant in vcf:
    print(variant.gt_types[1])
ADD COMMENT
0
Entering edit mode

Thank you all for your responses.

In fact, I have a list of 900 samples. And it would be easier for me to access the GT from the sample name rather than the sample index. But if it doesn't exist, no problem, I'll use the index.

ADD REPLY
1
Entering edit mode
from cyvcf2 import VCF
myvcf = VCF('1kgp.vcf.gz')

my_target_list = ['HG00124', 'HG00501', 'HG00635', 'HG02024'] #list of sample names

def get_sample_gts(vcf,target_list):
    target_index = [vcf.samples.index(x) for x in target_list]
    for v in vcf:
        target_gts = [v.genotypes[i] for i in target_index]
        print(v.REF,v.ALT,v.POS,target_gts)

get_sample_gts(myvcf,my_target_list)
ADD REPLY
0
Entering edit mode

thanks Arup!

ADD REPLY

Login before adding your answer.

Traffic: 1972 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6