filter multisample VCF based on Altered AD values
2
0
Entering edit mode
4.6 years ago
cocchi.e89 ▴ 290

I have a multisample VCF, ex. of a line:

1   14464   .   A   T   .   .   ECNT=1;PON;DP=67;MBQ=0,36;MFRL=0,278;MMQ=60,28;MPOS=23;POPAF=0.69;TLOD=29.47    GT:AD:AF:DP:F1R2:F2R1:SB    0/1:0,17:0.947:17:0,9:0,8:0,0,14,3  ./.:.:.:.:.:.:. ./.:.:.:.:.:.:. ./.:.:.:.:.:.:. 0/1:1,25:0.929:26:1,14:0,10:1,0,17,8    ./.:.:.:.:.:.:. ./.:.:.:.:.:.:. ./.:.:.:.:.:.:. ./.:.:.:.:.:.:. ./.:.:.:.:.:.:. ./.:.:.:.:.:.:. ./.:.:.:.:.:.:. ./.:.:.:.:.:.:. ./.:.:.:.:.:.:. ./.:.:.:.:.:.:. ./.:.:.:.:.:.:. ./.:.:.:.:.:.:. 0/1:1,12:0.866:13:0,5:1,6:1,0,5,7   ./.:.:.:.:.:.:. ./.:.:.:.:.:.:. 0/1:0,9:0.912:9:0,4:0,5:0,0,7,2 ./.:.:.:.:.:.:.

What I need is to filter samples based on their Altered AD removing samples with Alt AD < 10. In the example above this would mean to remove the 4th available sample (Alt_AD 9) keeping the first 3, getting something like this:

1   14464   .   A   T   .   .   ECNT=1;PON;DP=67;MBQ=0,36;MFRL=0,278;MMQ=60,28;MPOS=23;POPAF=0.69;TLOD=29.47    GT:AD:AF:DP:F1R2:F2R1:SB    0/1:0,17:0.947:17:0,9:0,8:0,0,14,3  ./.:.:.:.:.:.:. ./.:.:.:.:.:.:. ./.:.:.:.:.:.:. 0/1:1,25:0.929:26:1,14:0,10:1,0,17,8    ./.:.:.:.:.:.:. ./.:.:.:.:.:.:. ./.:.:.:.:.:.:. ./.:.:.:.:.:.:. ./.:.:.:.:.:.:. ./.:.:.:.:.:.:. ./.:.:.:.:.:.:. ./.:.:.:.:.:.:. ./.:.:.:.:.:.:. ./.:.:.:.:.:.:. ./.:.:.:.:.:.:. ./.:.:.:.:.:.:. 0/1:1,12:0.866:13:0,5:1,6:1,0,5,7   ./.:.:.:.:.:.:. ./.:.:.:.:.:.:. ./.:.:.:.:.:.:. ./.:.:.:.:.:.:.

Is there any available tool for that? I saw vcffilterjs based on this post but it works differently and removes the whole line if none is met and keeps it if at least one pass the filter.

Thank a lot in advance for any help!

vcf ad filter • 2.0k views
ADD COMMENT
0
Entering edit mode

What I need is to filter samples based on their Altered AD removing samples with Alt AD < 10. In the example above this would mean to remove the 4th available sample (Alt_AD 9) keeping the first 3, getting something like this:

how could you remove one or more genotype while keeping the structure of the VCF ?

ADD REPLY
0
Entering edit mode

yes I mean, is there no way to remove genotype entries keeping the structure of the VCF (eliminating if no entries remain) ?

ADD REPLY
0
Entering edit mode

well you can reset the genotype to './.' but you cannot remove a genotype. The VCF header with the samples' name would be meaningless + broken.

ADD REPLY
0
Entering edit mode

of course not removing, sorry, I meant to set it to ./. Is there any tool for that?

ADD REPLY
2
Entering edit mode
4.6 years ago

using VcfFilterJdk http://lindenb.github.io/jvarkit/VcfFilterJdk.html

java -jar dist/vcffilterjdk.jar --recalc -f biostar.code input.vcf.gz

with biostar.code :

return new VariantContextBuilder(variant).
    genotypes( variant.getGenotypes().stream().map(G->{
        if(!G.isCalled()) return G;
        if(!G.hasAD()) return G;
        final int ad[] = G.getAD();
        if(ad==null || ad.length!=2 || ad[1]>=10) return G;
        return  GenotypeBuilder.createMissing(G.getSampleName(),G.getPloidy());
        }).
        collect(Collectors.toList())).
    make();
ADD COMMENT
0
Entering edit mode
14 months ago

you can try: vcffilter -g "AD > 10" xxx.vcf

for example: $ less xxx.vcf chr1 4987481 chr1:4987481:OG A [chr15:68846507[A 250 PASS ABHet=0.4722;ABHom=1;AC=1;AF=0.0001139;AN=8776;END=4987481;MaxAAS=17;MaxAASR=0.4722;NHet=1;NHomAlt=0;NHomRef=4387;NUM_MERGED_SVS=2;PASS_AC=1;PASS_AN=8774;PASS_ratio=0.9998;QD=25;RefLen=1;SVMODEL=AGGREGATED;SVTYPE=BND;SV_ID=207;SeqDepth=142303;VarType=OG GT:AD:MD:DP:GQ:PL 0/0:21,0:0:21:60:0,60,255 0/0:19,0:0:19:60:0,60,255 0/0:31,0:0:31:99:0,99,255 0/0:30,0:0:30:99:0,99,255 0/0:36,0:0:36:99:0,99,255 ...

$ vcffilter -g "AD > 30" xxx.vcf >xxx.retain.ADgt30.vcf

$ less xxx.retain.ADgt30.vcf chr1 4987481 chr1:4987481:OG A [chr15:68846507[A 250 PASS ABHet=0.4722;ABHom=1;AC=1;AF=0.0001139;AN=8776;END=4987481;MaxAAS=17;MaxAASR=0.4722;NHet=1;NHomAlt=0;NHomRef=4387;NUM_MERGED_SVS=2;PASS_AC=1;PASS_AN=8774;PASS_ratio=0.9998;QD=25;RefLen=1;SVMODEL=AGGREGATED;SVTYPE=BND;SV_ID=207;SeqDepth=142303;VarType=OG GT:AD:MD:DP:GQ:PL . . 0/0:31,0:0:31:99:0,99,255 . 0/0:36,0:0:36:99:0,99,255 0/0:36,0:0:36:99:0,99,255 . 0/0:46,0:0:46:99:0,150,255 0/0:34,0:0:34:99:0,99,255 0/0:42,0:0:42:99:0,125,255 0/0:35,0:0:35:99:0,99,255 . 0/0:40,0:0:40:99:0,125,255 0/0:36,0:0:36:99:0,99,255...

Those genotypes that do not meet the criteria have all become "."

ADD COMMENT

Login before adding your answer.

Traffic: 3000 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6