Given a text file with the following format:
chrom start end num list lib1 lib2 lib3 +
chr1 4529048 4529082 3 lib1,lib2,lib3 1 1 1 +
chr1 4771642 4771666 3 lib1,lib2,lib3 1 1 1 +
chr1 4772370 4772405 3 lib1,lib2,lib3 1 1 1 +
(thousands of rows)
Do 2 things:
- Identify the rows whose start end coordinates overlap
- Then return the left-most and right-most coordinates
For example:
chrom start end num list lib1 lib2 lib3 +
chr1 1 5 3 lib1,lib2,lib3 1 1 1 +
chr1 4 6 3 lib1,lib2,lib3 1 1 1 +
chr1 7 9 3 lib1,lib2,lib3 1 1 1 +
In that example, the start and end coordinates of row 1 and row 2 overlap. In that case, the left-most coordinate is 1 and the right-most coordinate is 6.
Does a bioinformatics tool exist that can solve this problem?
Thank you!