Entering edit mode
6.7 years ago
fwuffy
▴
110
I know there are some good homozygous calls in this VCF, yet vcf-tools can not see them for whatever reason. Any filtering for MAF or alt read count results in total elimination of all 1/1 homozygous calls.
an example VCF row:
chr3 69313555 rs1483895 C T 222 . AC1=2;AC=4;AF1=1;AN=4;ANN=T|intron_variant|MODIFIER|FRMD4B|FRMD4B|transcript|XM_017005989.1|protein_coding|1/23|c.163-38G>A||||||,T|intron_variant|MODIFIER|FRMD4B|FRMD4B|transcript|NM_015123.2|protein_coding|1/22|c.163-38G>A||||||,T|intron_variant|MODIFIER|FRMD4B|FRMD4B|transcript|XM_017005992.1|protein_coding|1/23|c.1-38G>A||||||,T|intron_variant|MODIFIER|FRMD4B|FRMD4B|transcript|XM_017005994.1|protein_coding|1/22|c.1-38G>A||||||,T|intron_variant|MODIFIER|FRMD4B|FRMD4B|transcript|XM_005264722.1|protein_coding|1/22|c.28-38G>A||||||,T|intron_variant|MODIFIER|FRMD4B|FRMD4B|transcript|XM_017005990.1|protein_coding|1/23|c.28-38G>A||||||,T|intron_variant|MODIFIER|FRMD4B|FRMD4B|transcript|XM_017005993.1|protein_coding|1/22|c.1-38G>A||||||,T|intron_variant|MODIFIER|FRMD4B|FRMD4B|transcript|XM_017005991.1|protein_coding|2/24|c.1-38G>A||||||;ASP;CAF=0.393,0.607;COMMON=1;DP4=0,0,4,30;DP=36;FQ=-77.9863;G5;G5A;GENEINFO=FRMD4B:23150;GNO;HD;INT;KGPhase1;KGPhase3;MQ0F=0;MQ=60;MQSB=1;RS=1483895;RSPOS=69313555;RV;SAO=0;SF=0,1;SGB=-0.690438;SLO;SSR=0;TOPMED=0.394567,0.605433;VC=SNV;VDB=0.0219339;VLD;VP=0x05010008000517053f000100;WGT=1;dbSNPBuildID=88 GT:DP:PL 1/1:17:255,51,0 1/1:17:255,51,0
please note the duplicate GT:DP:PL fields at the end. This may be a problem.
command:
vcftools --vcf my.vcf --minDP 5 --minQ 30 --maf 0.00001 --remove-indels --recode --recode-INFO-all -c | grep '1/1:'
... nothing
EDIT:
The duplicate last column was not the problem, I removed it and got the same result. This problem occurs with both --maf and --mac > 0. --mac 0 returns rows, --mac 1 returns none.
I would encourage you to not use VCFtools for this. Switch to BCFtools and then try again. Due to the wide range of formatting that VCFs can have, these tools can sometimes struggle to correctly parse the information. I admit that it is difficult for the developers of these tools to account for every possibility.
I think that you may need something like
bcftools view -i 'MAF[0]<0.0001'
If that doesn't work, take a look at my script here, which can technically allow you to manual calculate the MAF, indirectly: A: How to get sample names and genotype for SNP in multi-sample VCF file