I am thinking about how I can extract shared overlap interval from WGS data with arbitrary percentage.
According to the bedtools document, overlapping intervals can be extracted. https://bedtools.readthedocs.io/en/latest/content/tools/intersect.html This is very useful and working well for me if I have a few samples.
However, I am analyzing several hundreds of samples, ended in no overlapped interval detected. This is understandable, let's say if 99 samples have T/A variant on the Chr1 position 1 but 1 sample does not have it, it results in no shared overlap interval. To overcome this situation, I would liked to extract variants that are overlapped in more than 99% among samples, 95%, 90% or even less, until I can find the overlapping intervals.
Does anyone know how to do it or could you please let me know the helpful websites? Or maybe GATK SelectVariants is doable?
Thank you!
Dear Pierre, Thank you very much for your quick & kind response. I appreciate it. I will give it a try tonight and let you know the results.