Is it possible to query column of particular sample on VCF?
2
0
Entering edit mode
2.4 years ago

I would like to extract specific samples from vcf with speed of bcftools query. Is it possible? Here is an example that obviously does not work:

bcftools query -f '%CHROM %POS %Sample1 %Sample2'

vcf • 1.3k views
ADD COMMENT
0
Entering edit mode
2.4 years ago

bcftools view --samples S1,S2 in.vcf

ADD COMMENT
0
Entering edit mode

Bcftools view appears to be very slow for extracting individual samples, e.g. for my 70 gigabit vcf it takes 1.5h to extract one sample.

ADD REPLY
0
Entering edit mode

split per regions, run in parallel, run bcftools concat at the end.

ADD REPLY
0
Entering edit mode

I this case 70 segmets of 600 samples would mean 42 000 jobs, which seems risky. But i will try.

ADD REPLY
0
Entering edit mode

uh ? these are only 70 jobs (70x extract two samples) , unless I didn't understand your question.

ADD REPLY
0
Entering edit mode

I didn't frame the problem correctly. The goal is to make individual VCF for each sample, of which there are 600 which is extremely slow with bcftools view.

I figured i could accomplish my aim with bcftools query and then substract genotypes that are present in my target sample. However, i don't know how to do that.

I will reframe the question and make another post.

ADD REPLY
0
Entering edit mode
2.4 years ago

I found a viable solution using bcftools +split plugin, e.g.:

bcftools +split file.vcf -Oz -o testDir -i'GT="alt"'

ADD COMMENT

Login before adding your answer.

Traffic: 1709 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6