Hi All,
I am looking for a way (possibly using bcftools) to extract samples based on the variant, but I haven't found it yet.
For instance:
CHROM POS REF ALT SAMPLE.A SAMPLE.B SAMPLE.C
chr1 10 A TAA 0/0 0/1 1/1
chr1 10 C GGA 0/0 0/0 0/1
I want to have samples that are Het or Hom for the SNP 1-10-A-TAA. so the Ideal output should be something similar to:
chr1 10 A TAA SAMPLE.B SAMPLE.C
The file I am working with are huge so I would need the best computational approach.
So far I came out with this:
bcftools query -r chr1:1-15 -i "GT=="AA" & GT=="AR"' -f '%CHR %POS %REF %ALT [/t%SAMPLE=%GT]\n' file_name.bcf
Any suggestions which can improve the speed?
Thanks to whoever spend some time to help me.
Hi Pierre,
Thanks a lot for your quick answer. Unfortunately, is note that easy to try it, I am working on HPC and they are quite strict about installing new packages. I'll try to git clone it.
Thanks again.
compile on your side and 'scp' the tool on your cluser...