How to find overlapping coordinates of a gtf file
2
0
Entering edit mode
3.4 years ago
Apex92 ▴ 300

I am trying to list the entries of a gtf file (gencode.vM25.annotation.gtf) if they have overlap. I have been looking around for some tool to perform it - is there such a tool to do what I want? Thanks

gtf-file sequencing genome annotation RNA-Seq • 2.8k views
ADD COMMENT
1
Entering edit mode

You need to start with BedTools suite. It has many features to deal with coordinates, e.g cluster, merge, etc and the documentation is really easy to follow.

ADD REPLY
0
Entering edit mode

This is definitely helpful - thank you

ADD REPLY
1
Entering edit mode

There are some tools to do interval overlap, bedtools and R/Bioconductor (rtracklayer+GenomicRanges) comes to mind. If you want to do it on the commandline use the former, or inside R use Bioconductor.

ADD REPLY
0
Entering edit mode
3.4 years ago

I assume you mean features in a GFF file that overlap other features in that same GFF file?

If so, have a look at AGAT , it certainly has some sub-programs that can do this.

here is more info: AGAT - Another Gff/Gtf Analysis Toolkit

ADD COMMENT
0
Entering edit mode

Yes, exactly I want to find the overlap between features in a GFF file with other features in that same GFF file. Thank you for your input.

ADD REPLY
0
Entering edit mode
3.4 years ago

You could use BEDOPS gtf2bed and bedmap to map entries to themselves, filtering out any that are disjoint with awk and cut:

$ gtf2bed < annotations.gtf \
    | bedmap --count --echo --echo-map - \
    | awk -v FS="\t" -v OFS="\t" '($1 > 1)'  \
    | cut -f2- \
    > answer.bed

References:

ADD COMMENT

Login before adding your answer.

Traffic: 1628 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6