Can variant exist in VCF even if no samples have that variant
1
0
Entering edit mode
23 months ago
Arda • 0

I have a VCF file I have downloaded from 1000genomes, I've then filtered the file I've got down to 5 samples using this script:

bcftools view --samples-file my5RandomIDs.txt ALL.chr22.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz -o myNewVCF.vcf

So far so good, it is my understanding that for each sample, the 0|0, 0|1, 1|0, 1|1 represents which allele the samples have the variant on and 0|0 means the sample does not have the variant. The problem is there are quite a few variants I can find (especially structural variants) that have 0|0 for all the samples which does not make sense to me because if none of the samples have the variant, it should not be on the VCF file. You can find the screen shot of the file (it is filtered to only structural variants), what is causing this behavior or did I misunderstand something fundamental.

Much thanks in advance.

enter image description here

1000genomes bcftools VCF • 692 views
ADD COMMENT
2
Entering edit mode

While splitting samples you can use the -c 1 parameter with bcftools view to filter out ref lines.

ADD REPLY
1
Entering edit mode

Please do not paste screenshots of plain text content, it is counterproductive. You can copy paste the content directly here (using the code formatting option shown below), or use a GitHub Gist if the content volume exceeds allowed length here.

code_formatting

ADD REPLY
1
Entering edit mode
23 months ago
LChart 4.5k

Yes. The 1000G population is polymorphic at those sites, and those specific 5 samples are all homozygous reference.

If you want to see only variant positions in your subset, you can pipe into bcftools filter with AC>0

ADD COMMENT

Login before adding your answer.

Traffic: 1588 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6