Hi all,
I have a vcf file with indels from a whole genome analysis and I want to detect overlapping between indels... I tried to use BEDtools-intersect but it asks for 2 files, and I only have a single file. Before trying the option of giving the same file 2 times to BEDtools-intersect maybe someone knows a better way or a better tool to achieve this.
Thanks!
If you want to remove them, then Galaxy has a tool Delete Overlapping Indels
interesting, but my file is 10Gb, does Galaxy support those upload sizes?
BEDOPS tools are designed to handle arbitrarily-sized inputs and may be a useful alternative to uploading a 10 Gb file. Please see my comment to Irsan's answer.
Is the data phased? If so you can use something like vcfgeno2haplo -w 1000 and it will describe when the indels are "impossible" (e.g. overlapping on the same haplotype) on stderr.
Alternatively, you could call with a method that doesn't generate overlapping indels (a haplotype detection method) and ensure that the input is left-aligned and homogenized.