Hi all,
is there way to filter out all the homozygous genotypes in a multi sample VCF file? in my vcf file I have unknown (./.) genotype, reference genotype (0/0) and two alternative genotypes (1/1 and 2/2). I want to remove these genotypes using SnpSift.jar tool.
I have used the following command to remove them:
$ cat SNPs_reheader_annot_passed.vcf.gz | java -jar /home/bandiken/snpEff/SnpSift.jar filter \
"(countHet() > 0)" > SNPs_reheader_annot_Het.vcf.gz
However in the final output file (the following screenshot) along with a few heterozygous genotypes in each row, we can also see the other homozygous genotypes (unwanted genotypes in my study).
so my question is that is it normal?
is there a way to remove the samples with homozygous genotypes without the whole row being removed?
Any help would be appreciated.
Thank you.
Well, the genotypes are aligned to the samples in the column and SNPs in the row. If you want to remove them, you'll have to remove all the rows/columns with the 0/1 genotype. So do you mean to subsititute the 0/1 to NA?
Thank you for your response. no actually I was wondering if I can remove all the homozygous genotypes to have a vcf file with just heterozygous genotypes or not.
so is it possible to substitute all the homozygous genotypes ( in my case 0/0, 1/1, 2/2 and ./.) with NA? because I do not want any homozygous genotype in my vcf file.
I only want heterozygous genotypes (0/1, 0/2 and1/2) in my vcf file.
Thank you again.