I am trying to list the entries of a gtf file (gencode.vM25.annotation.gtf) if they have overlap. I have been looking around for some tool to perform it - is there such a tool to do what I want? Thanks
I am trying to list the entries of a gtf file (gencode.vM25.annotation.gtf) if they have overlap. I have been looking around for some tool to perform it - is there such a tool to do what I want? Thanks
I assume you mean features in a GFF file that overlap other features in that same GFF file?
If so, have a look at AGAT , it certainly has some sub-programs that can do this.
here is more info: AGAT - Another Gff/Gtf Analysis Toolkit
You could use BEDOPS gtf2bed
and bedmap
to map entries to themselves, filtering out any that are disjoint with awk
and cut
:
$ gtf2bed < annotations.gtf \
| bedmap --count --echo --echo-map - \
| awk -v FS="\t" -v OFS="\t" '($1 > 1)' \
| cut -f2- \
> answer.bed
References:
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
You need to start with BedTools suite. It has many features to deal with coordinates, e.g cluster, merge, etc and the documentation is really easy to follow.
This is definitely helpful - thank you
There are some tools to do interval overlap, bedtools and R/Bioconductor (rtracklayer+GenomicRanges) comes to mind. If you want to do it on the commandline use the former, or inside R use Bioconductor.