Hi Folks,
I got two questions according to joint VCF (multiple samples) need your help.
I need to flag SNPs and Indels in the "FILTER" column (PASS or Low_confidence) in a joint VCF generate by GenotypeGVCFs. Basically, we call a family (trio) together so a typical joint VCF contains calls from child and parents. I followed the rules that proposed by GATK: http://gatkforums.broadinstitute.org/discussion/2806/howto-apply-hard-filters-to-a-call-set
Then I noticed that in the joint VCF, the INFO field is generated basically based on all samples in the VCF. However, I just want to tag the "FILTER" column based on Child (Child column). How can I apply the GATK SelectVariants on this joint vcf and use the information from Child only? Or any other tools would help?
I also want to filter the same joint VCF by DP in the child column, how can I do it with GATK or any other tools? SelectVariants seems to extract DP in the "INFO" field, which is a DP sum of from all samples that have been joint. Any suggestions?
Thank you very much!
-Linda
Thank you for your answer. The first question is to put in "PASS" or "Low_confidence" in the "FILTER" column based on some filtering thresholds on QD and FS of SNPs and Indels. For single sample VCF, its easy, but for joint VCF, I want to put in the flag based on one sample in the joint VCF (for example child). But GATK VariantFilteriation is using the "INFO" column which in joint VCF, is a summary of all samples in the VCF.
I hope I made it clearly. Any suggestions?