I would like to share a little utility that I wrote to make downloading genome sequences less of a hassle.
Genomepy is a simple software package to download genome sequences that contains both command-line tools as well as a Python application programming interface (API). It supports several providers for genomes, which currently include UCSC, NCBI and Ensembl. Downloaded genome sequences can be soft- or hard-masked and specific chromosomes or scaffolds can be either included or excluded based on regular expressions. Genomepy is a free and open source software and can be installed through standard package managers (bioconda, pip).
The github repository, including documentation, is here: https://github.com/simonvh/genomepy
JOSS publication here: http://dx.doi.org/10.21105/joss.00320
Hope you find it useful.
I just happened to install it. This is taking too long, is this expected?
Also, is this search case sensitive? Last question, do I need to provide complete name i.e.
Rattus norvegicus
instead of justrattus
EDIT 1: And finally this error, any clues?
The first time you run the search, it will take a long time as downloading the genome info from some providers will take time. This list will be cached locally, so subsequent queries should be faster. This cached list will expire after a week, so once in a while it will take longer again.
The search is not case-sensitive and you can be as (un)specific as you want. It will do a full-text search of all fields. For instance, for
rattus
I get the following:With regards to the error, can you reach this url in your web browser: http://genome.ucsc.edu/cgi-bin/das/dsn ?
I could access that link and here is what I can see
Click Here for screenshot
Do you know if you are behind a proxy by any chance?