Entering edit mode
3.1 years ago
tacrolimus
▴
140
I have a multi-sample vcf file and I want to get a table of IDs on the left column with the variants in which they have an alternate allele in. It should look like this:
ID1 chr2:87432:A:T_0/1 chr10:43234:C:G_1/1
ID2 chr2:87432_A:T_1/1
ID3 chr11:432434:T:G chr14:34234234:C:G chr20:34324234:T:C
This is to then read into R
I have tried combinations of:
bcftools query -f '[%SAMPLE\t] %CHROM:%POS:%REF:%ALT[%GT]\n'
but I keep getting sample IDs overlapping on the same line and I can't quite figure out the sytnax.
Your help would be much appreciated
Say my vcf has 80,000 samples would this take a very long time (for 150 variants)?
just try....
Thanks Pierre, this worked well. There were two lines that oddly contained lots of sampleIDs repeated with no spaces between them in one very long row, not sure what that is about. Also in the error log it seems to fail to read from the vcf in the second chunk of code....
83 hours! It's wasn't working because I set it up as a short job which was my bad!