Hi - I have a multi-subject vcf file and would like to set specific genotypes (GT) to missing for a set of subjects. However, the subjects that I need to set to missing are different for each variant.
For example, suppose I have this:
CHROM POS ID FORMAT sub1 sub2 sub3
22 12345 rs1111 GT 0/0 0/1 0/1
22 67891 rs2222 GT 1/1 0/0 0/0
22 45678 rs3333 GT 0/0 0/0 0/0
22 34567 rs4444 GT 0/1 0/0 0/1
I want to set rs1111 to missing for sub1, but then set rs3333 to missing for sub2 and sub3, so my results would look like this:
CHROM POS ID FORMAT sub1 sub2 sub3
22 12345 rs1111 GT ./. 0/1 0/1
22 67891 rs2222 GT 1/1 0/0 0/0
22 45678 rs3333 GT 0/0 ./. ./.
22 34567 rs4444 GT 0/1 0/0 0/1
I prefer using bcftools or vcftools, but am open to other ideas!
Wow, thanks! It took me a few days to get this installed and running, but it is doing something close to what I need now. Thank you so much!
I am not very famil1ar with java, so I hope you don't mind me asking a couple follow up questions:
So in my 'real life' VCF, I have other fields in FORMAT besides GT. For example, I have GT:GQ:SQ:DP:CN. I would like only the GT column to be set to ./. but the other format fields to remain the same. Right now, all fields are being set to missing. For example, I would like 0/0:51:0:57:2 to be changed only to ./.:51:0:57:2 instead of ./.:.:.:.. I assume something in the java expression needs to change, but I am not sure what.
In my 'real life' VCF, I have about ~500 variants that each need to be set to missing for between 1-20 subjects. I could use your command ~500 times, each time creating a new output VCF, but I wonder if there is a way to do all of this in the same run of vcffilterjdk without having just an extremely long -e command. Any thoughts?
Again, thanks very much for your help!