Hi All,
I have an aggregate vcf file that has several samples and this structure:
CHROM POS REF ALT SAMPLE1 SAMPLE2 ... SAMPLE_N
chr1 10 A T 0/0 0/1 ...0/0
chr1 30 T C 0/0 0/1 ...0/0
chr1 60 G T 1/0 0/0 ...0/0
I would like to check compound heterozygosity in non-coding regions. I've selected the non-coding regions and extracted from a gvcf file. What is the output I am looking for is something like
SAMPLE2 chr1 10 AND chr1 30
It is not phased, so compound het is not the right term, I'd like to see if there are any variant that co-occurs in some people. I tried genmod
and gemini
but it looks like I can't check what I wanted with these softwares.
I have no particular interest in any SNP, but in the regions. So once I set the regions I'd like to know is any sample as 2 or more SNP in the region taken into consideration.
Do you have any suggestions?
Thanks!