Entering edit mode
4.5 years ago
massa.kassa.sc3na
▴
630
Hi,
I'm looking for a command-line tool / library for working with sequence annotations. It should support following
- import annotations from .gff / .gtf formats (other usual formats are also fine)
- import some very basic data format (like csv) to read user defined data (regions of interest - ROI)
- provide functions for:
- finding annotations upstream / downstream from ROI,
- finding annotations within certain range, overlapping ROI
- allowing to filter the annotations based any criteria (from sequence annotations, mutual distance or overlaps, strand, etc.)
- allow detection of duplicated entries
- ability to specify "wiggle" room for the operations
- exporting selected annotations
For now I'm using BioPython + pandas, however I'm wondering if there is a better solution.
Thanks
One tool from @Juke-34 that may be of interest : AGAT
Thank you, I've skimmed through the documentation and this ticks some of the boxes. I'll probably use it some day.
Its very unlikely you're going to find a single tool that can do all of these things ready made. Off the top of my head, some tools that may be able to address one or more might be:
Cloning software is usually pretty good for manipulating annotated sequences. Several of the tools in that list are not free however unfortunately.
Thank you for the suggestions, I'll look into them.
While searching for something else I've stumbled upon the bedtools programs
intersect
andclosest
, which both are something along the lines of what I was looking for. Leaving comment for anybody who might find it useful.Links:
https://bedtools.readthedocs.io/en/latest/content/tools/closest.html
https://bedtools.readthedocs.io/en/latest/content/tools/intersect.html