Tool:Efficient querying of genomic reference databases with gget
1
12
Entering edit mode
2.5 years ago

gget enables efficient querying of genomic databases, such as Ensembl, UniProt, NCBI, directly into a Python or command-line programming environment. It was designed to support genomic data analysis.

A recurring challenge in interpreting genomic data (such as single-cell RNA-seq data) is the assessment of results in the context of existing reference databases. gget is a free and open-source command-line tool and Python package that enables efficient querying of genomic reference databases, such as Ensembl. gget consists of a collection of separate but interoperable modules, each designed to facilitate one type of database querying required for genomic data analysis in a single line of code.

Check out the preprint.
Source code and user manual.
Twitter thread with examples.

enter image description here

ensembl enrichr ncbi gget uniprot • 2.1k views
ADD COMMENT
3
Entering edit mode

Thank you so much for posting this! I have used it in one of my projects and it has proven so useful.

ADD REPLY
5
Entering edit mode
2.5 years ago

There is a substantial overlap between the functionality in gget relative to what bio does

https://www.bioinfo.help/

for example:

gget seq -id ENST00000288602

in bio looks like this:

bio fetch ENST00000288602

Beyond connecting to web APIs to download data with bio fetch and metadata with bio search the bio package supports a broad set of file format transformations, implements a novel data representation for hierarchical data, and provides high-performance network traversal of taxonomy, lineages, and Gene Ontology searches from that local representation - all without connecting to remote sites.

I would appreciate a mention with a citation for bio as prior work in the domain. As for citation, I would suggest either the

or, either the source code on GitHub, or the website I linked above https://www.bioinfo.help/

Edit (based on comments below):

Upon further consideration would recommend citing the GitHub repository:

ADD COMMENT
4
Entering edit mode

Hi Istvan,

Thanks for pointing to the bio tool. I was not aware of it. On looking at it briefly it seems like a great tool, that can do many things that gget can't (gff visualization, display taxonomy, reformat alignment download from Genbank / SRA etc.). On the downloading part there may be some overlap with ffq (https://www.biorxiv.org/content/10.1101/2022.05.18.492548v2) which you seem to have cited in the bioinfo.help website (thank you!).

I haven't had a chance to look at the book you linked to (it seems to be behind paywall). From looking at the Github repo I don't see "substantial" overlap at all; just some common functionality between bio fetch and gget seq (all the other things gget does such as extract isoforms, amino acids, etc. are things I don't think bio does), and gget search / bio seasrch (it looks like bio requires downloading a whole database though).

In any case, it's great to know about the Biostar book to cite. Thanks again, Lior

ADD REPLY
2
Entering edit mode

Thank you for the note and explanation.

The software is open-source with the most permissive MIT license. So are the website and the manual. The book just follows the standard scientific practice where the most selective resources are behind paywalls :-)

That being said I would like to correct my recommendation. My first suggestion was not well thought out, it was motivated mainly by wanting to collect more citations to the book.

For the reader's sake citing the GitHub repository would be more appropriate as it would always represent the most up-to-date information:

I will explicitly add this recommendation to the documentation.

ADD REPLY
1
Entering edit mode

Ok- thanks. Makes sense.

ADD REPLY

Login before adding your answer.

Traffic: 1591 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6