collapsing GFF coordinates with common start OR stop
1
0
Entering edit mode
6.3 years ago
Anand Rao ▴ 640

I have inherited some GFF files where identified features often have common start or stop coordinates. I need to collapse such occurrences to the longest instance. As examples, I have provided an example problem and the solution below. Are there readily available tools that can do this? OR someone's Perl / Python / Bioconductor scripting? Thanks!

PROBLEM

case 1 - plus strand, common start coord

chr1 fBS CDS 1000 2000 + . PfamID1

chr1 fBS CDS 1000 3000 + . PfamID1

chr1 fBS CDS 1000 4000 + . PfamID1

chr1 fBS CDS 1000 5000 + . PfamID1

normal

chr2 fBS CDS 9000 10000 + . PfamID1

case 2 - minus strand, common end coord

chr4 fBS CDS 5000 1000 - . PfamID1

chr4 fBS CDS 4000 1000 - . PfamID1

chr4 fBS CDS 3000 1000 - . PfamID1

normal

chr9 fBS CDS 6431 15000 + . PfamID1

case 3 - plus strand, common end coord

chr10 fBS CDS 1000 5000 + . PfamID2

chr10 fBS CDS 2000 5000 + . PfamID2

chr10 fBS CDS 3000 5000 + . PfamID2

chr10 fBS CDS 4000 5000 + . PfamID2

case 4 - minus strand, common start coord

chr12 fBS CDS 5000 4000 - . PfamID2

chr12 fBS CDS 5000 3000 - . PfamID2

chr12 fBS CDS 5000 2000 - . PfamID2

SOLUTION - should contain only 6 lines after collapsing each of cases 1, 2, 3 and 4 into one line each

chr1 fBS CDS 1000 5000 + . PfamID1

chr2 fBS CDS 9000 10000 + . PfamID1

chr4 fBS CDS 5000 1000 - . PfamID1

chr9 fBS CDS 6431 15000 + . PfamID1

chr10 fBS CDS 1000 5000 + . PfamID2

chr12 fBS CDS 5000 2000 - . PfamID2

GFF parse script bedtools Perl • 1.1k views
ADD COMMENT
0
Entering edit mode

chr4 fBS CDS 5000 1000 - . PfamID1

chr4 fBS CDS 4000 1000 - . PfamID1

chr4 fBS CDS 3000 1000 - . PfamID1

If I am not mistaken, for GFF "start" coordinate has always to be equal or smaller then "end" coordinate, so the above is not valid GFF.

ADD REPLY
2
Entering edit mode
6.3 years ago
h.mon 35k

First ensure your GFF is compliant with the specifications (see my comment above), then you can use bedtools merge.

ADD COMMENT

Login before adding your answer.

Traffic: 1785 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6