Extract heterozygous genotype (GT: 0/1 and 1/2) from the vcf file and calculate allelebalance.
1
0
Entering edit mode
8.7 years ago
kirannbishwa01 ★ 1.6k

Hi,

I need to extract heterozygous genotype from my vcf file. The genotype GT:0/1 and 1/2 should be extracted separately and is under the FORMAT field in vcf file. I am not posting the data from vcf file since there is been some formatting issue while pasting the values lately (not sure why???).

Also, I need to calculate the AB (allele balance) values for these heterozygous genotype (by sample). I have tried several vcf manipulator utilites but its not been very much helpful.

Can someone please assist me regarding the problem.

Thanks,

genotype vcf heterozygous allele balance SNP • 4.4k views
ADD COMMENT
0
Entering edit mode
8.7 years ago

using Bioalcidae: https://github.com/lindenb/jvarkit/wiki/BioAlcidae

$ curl -s "ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20100804/ALL.chrX.BI_Beagle.20100804.genotypes.vcf.gz" | gunzip -c |\
java -jar ~/src/jvarkit-git/dist/bioalcidae.jar -F vcf -e 'while(iter.hasNext()) {var ctx=iter.next(); for(var i=0;i< ctx.getNSamples();++i) {var g=ctx.getGenotype(i); if(!g.isHet()) continue; out.println(ctx.getContig()+" "+ctx.getStart()+" "+g.getSampleName()+" "+g.getAlleles()); }}'



X 60009 HG00142 [A*, C]
X 60009 HG00148 [A*, C]
X 60009 HG00231 [A*, C]
X 60009 HG00638 [A*, C]
X 60009 NA12046 [A*, C]
X 60009 NA12249 [A*, C]
X 60009 NA12813 [A*, C]
X 60009 NA18519 [A*, C]
X 60009 NA18541 [A*, C]
X 60009 NA18633 [A*, C]

I leave calculation of AB as an exercise :-)

ADD COMMENT
0
Entering edit mode

Hi @Pierre'

Thanks for the support. I will try your script tomorrow, also I want to get only the GT:0/1 or GT/1/2 separately. Something for calculating AB will be great.

Thanks,

ADD REPLY
0
Entering edit mode

Hi Pierre, I installed and ran the program but getting some error.

Command: curl -s "raw01_variants_S1-forTest.vcf" | gunzip -c | java -jar /home/everestial007/jvarkit/dist/bioalcidae.jar -F vcf -e 'while(iter.hasNext()) {var ctx=iter.next(); for(var i=0;i< ctx.getNSamples();++i) {var g=ctx.getGenotype(i); if(!g.isHet()) continue; out.println(ctx.getContig()+" "+ctx.getStart()+" "+g.getSampleName()+" "+g.getAlleles()); }}'

Error message: [main] ERROR jvarkit - Your input file has a malformed header: We never saw the required CHROM header line (starting with one #) for the input VCF file [main] ERROR jvarkit - Command failed

I think the header of the VCF file always starts with #. Since CHROM is in the first column its typically #CHROM.

Thanks,

ADD REPLY
0
Entering edit mode

why did you run 'gunzip -c ' if the input is not a gzipped source ?

ADD REPLY
2
Entering edit mode

why do you use curl ? please, try to understand the command line before running it.

ADD REPLY
0
Entering edit mode

Hi Pierre,

Thanks for the command but I a little not savvy with manipulating the commands. I will give a try. I had gzipped the file so used gunzip, but the posted command just had .vcf.

Will let you know !

ADD REPLY

Login before adding your answer.

Traffic: 2363 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6