Creating a per sample file from multi-sample vcf
1
1
Entering edit mode
3.1 years ago
tacrolimus ▴ 140

I have a multi-sample vcf file and I want to get a table of IDs on the left column with the variants in which they have an alternate allele in. It should look like this:

ID1 chr2:87432:A:T_0/1 chr10:43234:C:G_1/1
ID2 chr2:87432_A:T_1/1
ID3 chr11:432434:T:G chr14:34234234:C:G chr20:34324234:T:C

This is to then read into R

I have tried combinations of:

bcftools query -f '[%SAMPLE\t] %CHROM:%POS:%REF:%ALT[%GT]\n'

but I keep getting sample IDs overlapping on the same line and I can't quite figure out the sytnax.

Your help would be much appreciated

bcftools SNV SNP vcftools vcf • 1.5k views
ADD COMMENT
3
Entering edit mode
3.1 years ago

something like a loop over each sample:

bcftools query -l in.vcf.gz | while read S; do echo -n "$S" && bcftools view --samples $S -O u in.vcf.gz | bcftools query  -i 'AC>0'  -f '\t%CHROM:%POS:[%GT]' && echo ; done

I'm too lazy to convert the alleles' indexes to their value...

ADD COMMENT
0
Entering edit mode

Say my vcf has 80,000 samples would this take a very long time (for 150 variants)?

ADD REPLY
0
Entering edit mode

just try....

ADD REPLY
0
Entering edit mode

Thanks Pierre, this worked well. There were two lines that oddly contained lots of sampleIDs repeated with no spaces between them in one very long row, not sure what that is about. Also in the error log it seems to fail to read from the vcf in the second chunk of code....

ADD REPLY
0
Entering edit mode

83 hours! It's wasn't working because I set it up as a short job which was my bad!

ADD REPLY

Login before adding your answer.

Traffic: 1541 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6