Tool:pyGeno 1.2: Python package for Personalized Genomics and Proteomics
1
8
Entering edit mode
9.7 years ago
Tariq Daouda ▴ 220

pyGeno 1.2 is now available: http://pyGeno.iric.ca.

pyGeno is a python package that allows you to easily combine Reference Genomes and sets of Polymorphisms together to create personalized genomes. Personalized genomes can be used to work directly on the genomes of you subjects and be translated into Personalized Proteomes,

Multiple sets of of polymorphisms can also be combined together to leverage their independent benefits ex:

  • RNA-seq and DNA-seq for the same individual to improve the coverage
  • RNA-seq of an individual + dbSNP for validation
  • Combine the results of RNA-seq of several individual to create a genome only containing the common polymorphisms

pyGeno is also a personal database that give you access to all the information provided by Ensembl (for both Reference and Personalized Genomes) without the need of queries to distant HTTP APIs. Allowing for much faster and reliable genome wide study pipelines.

It also comes with parsers for several file types and various other useful tools.

python SNP rna-seq dbSNP ensembl • 3.5k views
ADD COMMENT
0
Entering edit mode

This sounded like a cool tool but I was unable to run it at all. Your installation fails on my machine right away

https://github.com/tariqdaouda/pyGeno/issues/2

also I strongly recommend disconnecting the data download from the python code - python is not all that well suited to downloading massive datasets - or at least provide alternatives via http rsync or bittorrent sources for the download of the data.

ADD REPLY
0
Entering edit mode

Thank you for bringing that up, the pip version was lagging behind. It is fixed now but I recommend the git version.

I had a look at the issue, the problem was that the folders containing the datawraps were not included in the pip version. But the rest of the installation went fine and you can import datawraps using the importation module.

I would nonetheless recommend that you either update pyGeno to the latest pip version to get the missing datawraps:

pip install --upgrade pyGeno

Or switch to the git version to get the latest bleeding edge updates.

Python is used for downloads to avoid dependencies to third party software, in order to simplify the installation as much as possible. That is also the reason why pyGeno comes with a set of parsers.

The datawraps shipped with the bootstrap module only contain links to data made available by third parties such as Ensembl and dbSNP. But you also have the possibility to create your own datawraps by downloading the files independently and including them into the tar.gz archive, as explained here and here

That being said, pyGeno has been tested many times with both Ensembl and dbSNP, and we never suffered any problem due to the initial downloads.

Thanks

ADD REPLY
0
Entering edit mode

Thanks for the fix. I like the concepts behind this pacakge and want to test it out in practice. More feedback to follow.

ADD REPLY
0
Entering edit mode

Thank you, your feedback is greatly appreciated.

ADD REPLY
0
Entering edit mode
7.3 years ago

Hi Tariq,

I have an quession on importing genome data in PyGeno. Since, The human reference sequence data was downloaded locally in HPC. The manifest.ini file was modified as following. It report a dug saying "sqlite3.OperationalError: disk I/O error",when I import the genome. However, the free disk space is enough in the HPC. Would you tell me how to fix such issue?

The platform I used is Python-2.7.13/PyGeno1.3.1 CentOS Linux release 7.3.1611 (Core)

Thank you very much.

Hao

manifest.ini

[package_infos]
description = Human reference genome
maintainer = Tariq Daouda
maintainer_contact = tariq.daouda@umontreal.ca
version = 1

[genome]
species = human
name = GRCh37.75
source = http://useast.ensembl.org/info/data/ftp/index.html

[chromosome_files]
10 = Homo_sapiens.GRCh37.75.dna.chromosome.10.fa.gz
11 = Homo_sapiens.GRCh37.75.dna.chromosome.11.fa.gz
12 = Homo_sapiens.GRCh37.75.dna.chromosome.12.fa.gz
13 = Homo_sapiens.GRCh37.75.dna.chromosome.13.fa.gz
14 = Homo_sapiens.GRCh37.75.dna.chromosome.14.fa.gz
15 = Homo_sapiens.GRCh37.75.dna.chromosome.15.fa.gz
16 = Homo_sapiens.GRCh37.75.dna.chromosome.16.fa.gz
17 = Homo_sapiens.GRCh37.75.dna.chromosome.17.fa.gz
18 = Homo_sapiens.GRCh37.75.dna.chromosome.18.fa.gz
19 = Homo_sapiens.GRCh37.75.dna.chromosome.19.fa.gz
1 = Homo_sapiens.GRCh37.75.dna.chromosome.1.fa.gz
20 = Homo_sapiens.GRCh37.75.dna.chromosome.20.fa.gz
21 = Homo_sapiens.GRCh37.75.dna.chromosome.21.fa.gz
22 = Homo_sapiens.GRCh37.75.dna.chromosome.22.fa.gz
2 = Homo_sapiens.GRCh37.75.dna.chromosome.2.fa.gz
3 = Homo_sapiens.GRCh37.75.dna.chromosome.3.fa.gz
4 = Homo_sapiens.GRCh37.75.dna.chromosome.4.fa.gz
5 = Homo_sapiens.GRCh37.75.dna.chromosome.5.fa.gz
6 = Homo_sapiens.GRCh37.75.dna.chromosome.6.fa.gz
7 = Homo_sapiens.GRCh37.75.dna.chromosome.7.fa.gz
8 = Homo_sapiens.GRCh37.75.dna.chromosome.8.fa.gz
9 = Homo_sapiens.GRCh37.75.dna.chromosome.9.fa.gz
MT = Homo_sapiens.GRCh37.75.dna.chromosome.MT.fa.gz
X = Homo_sapiens.GRCh37.75.dna.chromosome.X.fa.gz
Y = Homo_sapiens.GRCh37.75.dna.chromosome.Y.fa.gz

[gene_set]
gtf = Homo_sapiens.GRCh37.75.gtf.gz

bug

>>> import pyGeno.bootstrap as B
>>> B.importGenome("Human.GRCh37.75/")
Importing genome package: /home/yeh/program/Python-2.7.13/lib/python2.7/site-pac                                          kages/pyGeno/bootstrap_data/genomes/Human.GRCh37.75/... (This may take a while)
Importing:
        description:  Human reference genome
        maintainer:  Tariq Daouda
        maintainer_contact:  tariq.daouda@umontreal.ca
        version:  1
Genome:
        species:  human
        name:  GRCh37.75
        source:  http://useast.ensembl.org/info/data/ftp/index.html
...
Importing gene set infos from /home/yeh/program/Python-2.7.13/lib/python2.7/site                                          -packages/pyGeno/bootstrap_data/genomes/Human.GRCh37.75/Homo_sapiens.GRCh37.75.g                                          tf.gz...
Backuping indexes...
Droping all your indexes, (don't worry i'll restore them later)...
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/yeh/program/Python-2.7.13/lib/python2.7/site-packages/pyGeno/bootstrap.py", line 105, in importGenome
    PG.importGenome(path, batchSize)
  File "/home/yeh/program/Python-2.7.13/lib/python2.7/site-packages/pyGeno/importation/Genomes.py", line 179, in importGenome
    chros = _importGenomeObjects(gtfFile, chromosomeSet, genome, batchSize, verbose)
  File "/home/yeh/program/Python-2.7.13/lib/python2.7/site-packages/pyGeno/importation/Genomes.py", line 257, in _importGenomeObjects
    Transcript_Raba.flushIndexes()
  File "build/bdist.linux-x86_64/egg/rabaDB/Raba.py", line 547, in flushIndexes
  File "build/bdist.linux-x86_64/egg/rabaDB/rabaSetup.py", line 148, in dropIndexByName
  File "build/bdist.linux-x86_64/egg/rabaDB/rabaSetup.py", line 224, in execute
sqlite3.OperationalError: disk I/O error
ADD COMMENT
0
Entering edit mode

When I use the following command, it error was listed as following. sqlite3.OperationalError: database or disk is full

>>> from pyGeno.importation.Genomes import *
>>> importGenome('/home/yeh/program/Python-2.7.13/lib/python2.7/site-packages/pyGeno/bootstrap_data/genomes/Human.GRCh37.75/')
Importing genome package: /home/yeh/program/Python-2.7.13/lib/python2.7/site-packages/pyGeno/bootstrap_data/genomes/Human.GRCh37.75/... (This may take a while)
Importing:
        description:  Human reference genome
        maintainer:  Tariq Daouda                                                \ - Chr\ progress[~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-:>] 100.00% (2828313/2828312) runtime: 21.471min, remaining: -0.000sc, avg: 0.000sc                                                                                                                                    | progress[~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-:>] 104.00% (26/25) runtime: 0.003sc, remaining: -0.000sc, avg: 0.000sc                                                                                                      saving genome object...                                                                             restoring core indexes...                                                                           Traceback (most recent call last):                                                                    File "<stdin>", line 1, in <module>                                                                 File "/home/yeh/program/Python-2.7.13/lib/python2.7/site-packages/pyGeno/importation/Genomes.py", line 179, in importGenome                                                                               chros = _importGenomeObjects(gtfFile, chromosomeSet, genome, batchSize, verbose)                  File "/home/yeh/program/Python-2.7.13/lib/python2.7/site-packages/pyGeno/importation/Genomes.py", line 419, in _importGenomeObjects
    Transcript.ensureGlobalIndex('exons')
  File "/home/yeh/program/Python-2.7.13/lib/python2.7/site-packages/pyGeno/pyGenoObjectBases.py", line 223, in ensureGlobalIndex
    cls._wrapped_class.ensureIndex(fields)
  File "build/bdist.linux-x86_64/egg/rabaDB/Raba.py", line 510, in ensureIndex
  File "build/bdist.linux-x86_64/egg/rabaDB/rabaSetup.py", line 138, in createIndex
  File "build/bdist.linux-x86_64/egg/rabaDB/rabaSetup.py", line 224, in execute
sqlite3.OperationalError: database or disk is full
ADD REPLY
0
Entering edit mode

Hi Hao,

It was going well until it stopped at the indexing of exons. This is by far the biggest index that is automatically created. Unfortunately, I can't tell you what caused the error since I don't have admin access to you computer. I can however give you some tips.

You need at least 2GB of free space to store one human reference genome. This is without counting the temporary space that sqlite takes while running.

You can find how to redirect/increase the temporary space used by sqlite here: https://stackoverflow.com/questions/23249843/sqlite3-vacuum-database-or-disk-is-full

Another possibility is that pyGeno's database has been somehow corrupted. If that is the case you can erase the .pyGeno folder in your home directory and start a new importation.

Best,

ADD REPLY
0
Entering edit mode

Thank you very much, Tariq. The bug was properly fixed, since I redirect the temporary file folder.

Hao

ADD REPLY

Login before adding your answer.

Traffic: 2515 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6