Vcf Filtering
1
0
Entering edit mode
9 weeks ago

Hi, I recently obtained a vcf file with 90 samples from different generations. I realized that the file is extremely large. This is because I have a lot of possible sequencing errors, observing AF really closed to 0 or 1, and only in one of the samples out of 90. For this reason, I want to filter the whole file, selecting only the SNPs which at least in one sample has a value between 0.15 and 0.85. Do you know how can I do it?

I tried with this code "bcftools view -i 'sum(AF >= 0.15 && AF <= 0.85) > 0'" but it didn't gave me the expected result. Thank you!

Vcf Filtering AF • 496 views
ADD COMMENT
1
Entering edit mode

AF is usually an INFO field, not a FORMAT/genotype field. What is the definition of AF in your header file and please, give us an example of such genotypes.

ADD REPLY
0
Entering edit mode

Hi, the definition of AF in my header is this one:

##FORMAT=<ID=AF,Number=A,Type=Float,Description="Observed ALT frequencies"> . And here I add an example of the genotype: GT:PL:DP:AD:SCR:QS:SAD:AF:XF    ./.:.:48:19,21:12:540,524:40:0.525:0.511628     ./.:.:42:24,13:9:537,305:37:0.351351:0.35. 

Thank you in advance

ADD REPLY
0
Entering edit mode

I don't understand what AF refers to in this scenario when the genotype is ./.

Are these diploid organisms?

ADD REPLY
2
Entering edit mode
9 weeks ago

using vcffilterjdk https://jvarkit.readthedocs.io/en/latest/VcfFilterJdk/

java -jar dist/jvarkit.jar vcffilterjdk -e 'return variant.getGenotypes().stream().anyMatch(G->{double v=G.getAttributeAsDouble("AF",-1);return v>=0.15 && v<=0.85;});'  in.vcf
ADD COMMENT

Login before adding your answer.

Traffic: 1471 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6