how to extract unique variants from GVCF
1
1
Entering edit mode
6.3 years ago

[note: cross-posted on GATK forum - still awaiting a response]

I have a GVCF (generated using GATK's HaplotypeCaller w/ -ERC GVCF parameter) of 36 related samples and would like to determine the (potentially de novo) variants that are unique to each sample. Short of creating 36 N-1 GVCFs for discordance testing, or extracting individual sample VCFs for subtraction, is there a straightforward method to obtain the desired information?

GATK variant-calling GVCF filtering • 3.7k views
ADD COMMENT
4
Entering edit mode
6.3 years ago

merge your g.vcf files with CombineGVCFs https://software.broadinstitute.org/gatk/documentation/tooldocs/3.8-0/org_broadinstitute_gatk_tools_walkers_variantutils_CombineGVCFs.php

and then I would use bioalcidaejdk to find the sample containing a singleton: http://lindenb.github.io/jvarkit/BioAlcidaeJdk.html

java -jar dist/bioalcidaejdk.jar -e 'stream().forEach(V->{final List<Genotype> L=V.getGenotypes().stream().filter(G->G.isHet() || G.isHomVar()).collect(Collectors.toList());if(L.size()!=1) return;final Genotype g=L.get(0);println(V.getContig()+" "+V.getStart()+" "+V.getReference()+" "+g.getSampleName()+" "+g.getAlleles());});'  input.vcf
  • stream(): get a stream of variants
  • final List<Genotype> L=V.getGenotypes().stream() : get a stream of genotpes for variant V
  • filter(G->G.isHet() || G.isHomVar()). get only genotypes HET or HOM_VAR
  • collect(Collectors.toList()); convert to list
  • if(L.size()!=1) return; list must contain a singleton
  • final Genotype g=L.get(0); the singleton
  • println(V.getContig()+" "+V.getStart()+" "+V.getReference()+" "+g.getSampleName()+" "+g.getAlleles()); print the singleton
ADD COMMENT
1
Entering edit mode

As usual, @PierreLindenbaum has the right tool for the job. Thanks!

ADD REPLY
0
Entering edit mode

Now this post needs to be added to the list there, I guess?

ADD REPLY
0
Entering edit mode

Will do. I cross-posted here only b/c the response time there can be... slow.

ADD REPLY
1
Entering edit mode

Sorry, what? I was saying that this post would be linked in Pierre's post page, which he has done now.

ADD REPLY
1
Entering edit mode

@Ram: Sorry for the confusion. I thought your comment was directed at me, since I mentioned cross-posting on GATK. I still intend to share Pierre's solution there, so others will be aware of it.

ADD REPLY
0
Entering edit mode

Ah, OK. No, I was referring to the cross-linking that Pierre does on his tool-page. Sorry, I should have been clearer in my comment.

ADD REPLY
0
Entering edit mode

I added -F VCF to run the code while used for gvcf file. Is there any user manual to understand the syntax?

ADD REPLY

Login before adding your answer.

Traffic: 1682 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6