How to get the information of total number of homozygous and heterozygous non-reference from a multi sample VCF file?
1
1
Entering edit mode
6.6 years ago
aneekbiotech ▴ 10

How to get the information of total number of homozygous and heterozygous non-reference from a multi sample VCF file? As an example, I want the information like if a nonreference allele is B, then how many samples having AB and BB in that VCF file. Basically the number of samples. Is there any software tool available for that?

allele • 2.5k views
ADD COMMENT
0
Entering edit mode

You can find that out by looking at the GT genotype field in the VCF file.

I recommend using the VariantAnnotation R package as you can do it in two lines: 1 to load the VCF, another to parse and summarise the VCF genotypes.

ADD REPLY
0
Entering edit mode

@ d-cameron,

Thank you very much for prompt reply, however I am not very familiar with the programming language. Is there any other way to do that, i.e. using available software like VCFtools, BCFtools, GATK. Also I want to add this information for each alternate allele as column in my annovar annotated variant file (in excel)..

ADD REPLY
1
Entering edit mode
6.6 years ago

using bioalcidaejdk: http://lindenb.github.io/jvarkit/BioAlcidaeJdk.html

  • stream all vcf
  • convert to a stream of genotype
  • filter out the HOM_REF
  • convert to sampleName+"\t"+genotype-type
  • count

ex:

java -jar dist/bioalcidaejdk.jar -e 'stream().flatMap(V->V.getGenotypes().stream()).filter(G->!G.isHomRef()).map(G->G.getSampleName()+"\t"+G.getType().name()).collect(Collectors.groupingBy(Function.identity(), Collectors.counting())).forEach((K,V)->println(K+"\t"+V));' src/test/resources/rotavirus_rf.vcf.gz

S3  HOM_VAR 8
S2  HOM_VAR 8
S1  HET 7
S2  HET 7
S3  HET 7
S5  HOM_VAR 8
S4  HET 7
S1  HOM_VAR 2
S4  HOM_VAR 7

another one: today, I wrote a GUI http://lindenb.github.io/jvarkit/VcfStatsJfx.html, one of the screens is a count of genotype-type per sample:

ADD COMMENT

Login before adding your answer.

Traffic: 1552 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6