Allele Depth (AD) / Allele Balance (AB) Filtering in GATK 4
1
2
Entering edit mode
6.5 years ago
maegsul ▴ 170

Hi,

I am trying to filter my GATK 4.0.3 - HaplotypeCaller generated multi-sample VCF for allele depth (AD) annotation at sample genotype-level (so available in "FORMAT" fields of each sample).

I think prior to GATK 4, this annotation was available as "Allele Balance" (AB) ratios (generated by AlleleBalanceBySample), but it is not available anymore in GATK 4. So I tried to filter genotypes based on AD field, that is exactly the same thing but indicated in "X,Y" format, so in an array format of integers. This array format makes it difficult to filter based on depth of alternative allele divided by depth of all alleles at a specific site.

What could be a solution to this? If I could turn this array into a ratio, I could easily filter genotypes using VariantFiltration or other tools such as vcflib/vcffilter. I also tried the below code (following https://gatkforums.broadinstitute.org/gatk/discussion/1255/what-are-jexl-expressions-and-how-can-i-use-them-with-the-gatk):

gatk VariantFiltration -R $ref -V $vcf -O $output --genotype-filter-expression 'vc.getGenotype("Sample1").getAD().1 / vc.getGenotype("Sample1").getAD().0 > 0.33' --set-filtered-genotype-to-no-call --genotype-filter-name 'ABfilter'

This worked, but strangely it filters the variant for all samples if only one of the sample have allele depths that are not in balance (defined by the filter). If it worked only for Sample1, I was planning to write a quick loop for all the samples for instance. I tried the same with GATK 3.8, but still it filters whole variant for all the samples if it is filtered in just one sample.

Any suggestions?

PS: I've asked the same question here also in GATK forum: https://gatkforums.broadinstitute.org/gatk/discussion/12140/allele-depth-ad-allele-balance-ab-filtering-in-gatk-4

gatk filtering sequencing allele depth • 10k views
ADD COMMENT
4
Entering edit mode
6.5 years ago

using vcffilterjdk: http://lindenb.github.io/jvarkit/VcfFilterJdk.html

the following expression keep the variants if all the genotypes with AD have AD[1]/AD[0]>0.33 .

java -jar dist/vcffilterjdk.jar -e 'return variant.getGenotypes().stream().filter(G->G.hasAD()).map(G->G.getAD()).allMatch(A->A.length>1 && A[0]>0 && A[1]/(double)A[0]> 0.33);'  in.vcf
ADD COMMENT
0
Entering edit mode

Thank you very much Pierre! I didn't know about vcffilterjdk - it looks very useful.

However, this command above is not exactly what I wanted. It actually filtered out all of the variants (i.e. at least in one of my samples allelic imbalance is present). What I want is to assign ./. missing genotypes for sample-level genotypes in a VCF, if they fail to pass defined AD ratio (so, keeping the variant itself still if it is present in at least one sample with desired AD ratio thresholds). Is there a way to filter directly genotypes, but not variants?

ADD REPLY
1
Entering edit mode

if I understand well , try the following script in the '-e' option (or use a file with option '-f' )

return new VariantContextBuilder(variant).
    genotypes(
        variant.
        getGenotypes().
        stream().
        map( G->{
            if(G.hasAD()) {
                final int A[]= G.getAD();
                if(A.length>1 &&  A[0]>0 && A[1]/(double)A[0]> 0.33) return G;
                }
            return  GenotypeBuilder.createMissing(G.getSampleName(),2);
            }).
        collect(Collectors.toList())
        ).
    make();

note: this will replace some genotypes with NO_CALL genotype: you should recalculate INFO data like DP/AF/AC/ etc...

ADD REPLY
0
Entering edit mode

This worked very well, thank you very much!

Only thing is that, I have to modify this script only for heterozygous calls, otherwise my homozygous ref calls are also turned into NO_CALL genotype. Can I add something like .isHet(); for that purpose?

ADD REPLY
1
Entering edit mode
ADD REPLY
0
Entering edit mode

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted.

Upvote|Bookmark|Accept

ADD REPLY
0
Entering edit mode

Yes, I marked it as accepted, thank you very much again!

I couldn't figure out where to add .isHet() in the working script though, I am having some errors with the script...

ADD REPLY
1
Entering edit mode

I don't know where is the logic of your HET, it could look like

 if(G.hasAD() && G.isHet()) { ...)

or

if(G.isHet()) return G;
if(G.hasAD()) { ...

or

if(!G.isHet()) return G;
if(G.hasAD()) { ...
ADD REPLY
0
Entering edit mode

The logic is fine with the first two options you indicated, but I couldn't manage to make it work. Still all the genotypes (regardless of being a heterozygous or homozygous ref call) are filtered based on the custom AD filter. My goal is to assign NO_CALL genotypes only for heterozygous genotypes having AD values out of the indicated ratio.

I tried:

return new VariantContextBuilder(variant).
    genotypes(
        variant.
        getGenotypes()
        stream().
        map( G->{
         if(G.hasAD() && G.isHet()) {
                final int A[]= G.getAD();
                if(A.length>1 &&  A[0]>0 && A[1]/(double)A[0]> 0.33) return G;
                }
            return  GenotypeBuilder.createMissing(G.getSampleName(),2);
            }).
        collect(Collectors.toList())
        ).
    make();

and

return new VariantContextBuilder(variant).
    genotypes(
        variant.
        getGenotypes()
        stream().
        map( G->{
         if(G.isHet()) return G;
         if(G.hasAD()) {
                final int A[]= G.getAD();
                if(A.length>1 &&  A[0]>0 && A[1]/(double)A[0]> 0.33) return G;
                }
            return  GenotypeBuilder.createMissing(G.getSampleName(),2);
            }).
        collect(Collectors.toList())
        ).
    make();
ADD REPLY
2
Entering edit mode

My goal is to assign NO_CALL genotypes only for heterozygous genotypes having AD values out of the indicated ratio.

 if(G.hasAD()) {
    if(!G.isHet()) return G;
ADD REPLY
0
Entering edit mode

Yes, now I understood my mistake... Now it works very well!

Thank you very much once more Pierre!

ADD REPLY
0
Entering edit mode

Greetings,

This has been very helpful as I was not aware of vcffilterjdk until this post. I too am trying to filter by AD. However, I for some reason cannot figure out how to prevent it from changing homozygous Ref calls to ./. Can you please advise what I am doing wrong? I have added the snippet of code which should skip over non-het calls but it still changes everything.

Here is how I am calling it:

`java -jar /home/nathanael_herrera/Software/jvarkit/dist/vcffilterjdk.jar -o output.vcf -e 'return new VariantContextBuilder(variant).
genotypes(
    variant.
    getGenotypes().
    stream().
    map( G->{
        if(G.hasAD()) {
            if(!G.isHet()) return G;
            final int A[]= G.getAD();
            if(A.length>1 &&  A[0]>0 && A[1]/(double)A[0]> 0.33) return G;
            }
        return  GenotypeBuilder.createMissing(G.getSampleName(),2);
        }).
    collect(Collectors.toList())
    ).
make();' input.vcf

Thank you in advance for any advice!

ADD REPLY
0
Entering edit mode

Hi,

I think I have come across with the same problem but with the below code is working well for me, maybe give it a try:

return new VariantContextBuilder(variant).
genotypes(
    variant.
    getGenotypes().
    stream().
    map( G->{
     if(G.hasAD() && G.isHet()) {
            final int A[]= G.getAD();
            if( A.length>1 &&  A[0]>0 && A[1]/(double)A[0] > 0.332 && A[1]/(double)A[0] < 3.01 ) return G;
         }
        if(G.hasAD() && G.isHom()) {
            final int A[]= G.getAD();
         if( A[0] == 0 ) return G;
            if( A.length>1 &&  A[0]>0 && A[1]/(double)A[0] > 0.8999 ) return G;
            }
        if(G.hasAD() && G.isHomRef()) {
            final int A[]= G.getAD();
         if( A[1] == 0 ) return G;
            if( A.length>1 &&  A[1]>0 && A[1]/(double)A[0] < 0.11 ) return G;
            }
        return  GenotypeBuilder.createMissing(G.getSampleName(),2);
        }).
 collect(Collectors.toList())
    ).
make();

Here I also filter ref homozygous and alt homozygous calls based on 1:9 or 9:1 ratios though, in your case you can delete these.

ADD REPLY
0
Entering edit mode

Excellent, works very well! Thank you very much for the response! Cheers

ADD REPLY

Login before adding your answer.

Traffic: 2342 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6