Hi everybody,
I want to remove all sites that are homozygous reference (0/0) per each row in the vcf file and get the output with vcf format. So, I simply used zgrep -v "0[/|]0" < file1.gz.vcf > output.vcf
. However, it sounds that many variants is removed. Just for make sure, please kindly let me know if I did right?
Thanks
as far as I undersand OP don't want ANY HOM_REF sample in the row.
Hmm, yes might be.
In this case:
Right, I just hetero and homo-var NOT homo-ref per row. I before used the
bcftools view -e 'GT[*]="RR"' file.vcf
, but sounds worked wrong as it just kept 530 variants from 2134618 variants. Our admin must install Pierre's tool on the cluster to test it.For me this does not sound surprisingly.
If you have no sample within 1200 samples, which is hom REF on a given position, this means, that what is called ref, is very rare. Of course this is possible, but I would guess it is rare.
As long as you have java installed on the cluster you should be able to run Pierre's program from your own directory.