I have a dataframe with the coordinates of large mutations (CNVs)
chr start end
1 200 1000
1 400 800
1 600 1500
How can I identify zones that overlap and count occurrences?
The result for the dataframe given above would be something like this
start end occurrences
200 800 2
600 1000 2
600 800 3
In the HPC where I work, I have a few python libraries (e.i. pandas) and Bedtools. How could I do this?
That's what you looking for!
how would you do that with bedtools intersect ?
It's bad practice to ask a question without an output example. How exactly is output supposed to be formatted. Check
bedtools intersect
as suggested, especially the counting option-wo
. By the way, package managers likeconda
do not require root access so you always have the option to install most software you want with that, even on HPCs.I don't really know what you mean by "without an output example". I have provided the output I would like to achieve. The HPC I am using is the Genomic England Research Environment. In this HPC you cannot install anything if this is not previously provided by them. Not even using Conda.
I have checked the -wo option. But this mention that you need to bed file. I only have one. Not sure how to do what you mean.
Your partitioning in the example output is incorrect or at least inconsistent with the starting example dataframe. Nonetheless, BEDOPS
bedops --partition
is an easy way to do this correctly.