Dear biostars,
I have .vcf.gz
files containing a few dozen samples. From here I wish to identify the HET
and HOM
variants to better understand the population I am interested in.
I was informed that bcftools is a good option and I have tried following instructions from the documentation, however I believe I am not understanding some syntax. Below I have my attempts and the common error I am stumbling on.
First to check my vcf.gz
files have the fields on interest here is output from query
.
bcftools query -f '%CHROM %POS %REF %ALT\n' $In | head -3
chrM 285 C CA
chrM 299 CA C
chrM 302 AC A
Then I try to pipe query
to fill-tags
to retrieve HET
and HOM
genotypes. Here is the code and the error.
bcftools query -f '%CHROM %POS %REF %ALT\n' $In |\
bcftools +fill-tags -o $Out -- -t AC_Het,AC_Hom
Failed to open -: unknown file type
I have even tried to skip piping and make an intermediary .bcf
file instead. Below is code and the error.
bcftools query -f '%CHROM %POS %REF %ALT\n' $In -o $Temp
bcftools +fill-tags $Temp -Ou -o $Out -- -t AC_Het,AC_Hom
Failed to open X.Indel.tmp.bcf: unknown file type
I am clearly not importing the .bcf file correctly. Appreciate any help I can get with this issue.
Hi @4galaxy77, thanks I think your suggestion has helped! Well the idea was to eventually import the data into R. I thought to use query -> +fill-tags, but based on your description I may have had the tools in the wrong order. :\