I am looking for code/tools to merge and de-duplicate long lists of structural variants calls in bedpe or any other related format.
My lines come from merging numerous genome datasets and I want to identify redundant structural events, often overlapping in both ends, sometimes only at one end but leading to the similar gene disruption effect.
The author of Hydra (Aaron Q.) pointed me to a python script coming with his tool dedupDiscordants.py). Thanks to Aaron's help I could finally get it to work on my data but the result is not deduplicated enough for my needs.
I would like to have control on each overlapping end independently and on the operation to apply to found overlapping calls (merge to shortest gap or to shortest common flank pieces for instance).
Anyone having pieces of code to find paired intersection between double coordinate calls (left arm / right arm of the junction) are welcome to comment and hopefully help. Thanks in advance
Stephane
5 years later...I wrote a solution for this http://crazyhottommy.blogspot.com/2016/03/breakpoints-clustering-for-structural.html