Question

Tool:pyGeno 1.2: Python package for Personalized Genomics and Proteomics

8

Entering edit mode

10.2 years ago

Tariq Daouda ▴ 220

pyGeno 1.2 is now available: http://pyGeno.iric.ca.

pyGeno is a python package that allows you to easily combine Reference Genomes and sets of Polymorphisms together to create personalized genomes. Personalized genomes can be used to work directly on the genomes of you subjects and be translated into Personalized Proteomes,

Multiple sets of of polymorphisms can also be combined together to leverage their independent benefits ex:

RNA-seq and DNA-seq for the same individual to improve the coverage
RNA-seq of an individual + dbSNP for validation
Combine the results of RNA-seq of several individual to create a genome only containing the common polymorphisms

pyGeno is also a personal database that give you access to all the information provided by Ensembl (for both Reference and Personalized Genomes) without the need of queries to distant HTTP APIs. Allowing for much faster and reliable genome wide study pipelines.

It also comes with parsers for several file types and various other useful tools.

python SNP rna-seq dbSNP ensembl • 3.9k views

ADD COMMENT • link updated 3.0 years ago by Ram 45k • written 10.2 years ago by Tariq Daouda ▴ 220

0

Entering edit mode

This sounded like a cool tool but I was unable to run it at all. Your installation fails on my machine right away

https://github.com/tariqdaouda/pyGeno/issues/2

also I strongly recommend disconnecting the data download from the python code - python is not all that well suited to downloading massive datasets - or at least provide alternatives via http rsync or bittorrent sources for the download of the data.

ADD REPLY • link updated 3.0 years ago by Ram 45k • written 10.2 years ago by Istvan Albert 102k

0

Entering edit mode

Thank you for bringing that up, the pip version was lagging behind. It is fixed now but I recommend the git version.

I had a look at the issue, the problem was that the folders containing the datawraps were not included in the pip version. But the rest of the installation went fine and you can import datawraps using the importation module.

I would nonetheless recommend that you either update pyGeno to the latest pip version to get the missing datawraps:

pip install --upgrade pyGeno

Or switch to the git version to get the latest bleeding edge updates.

Python is used for downloads to avoid dependencies to third party software, in order to simplify the installation as much as possible. That is also the reason why pyGeno comes with a set of parsers.

The datawraps shipped with the bootstrap module only contain links to data made available by third parties such as Ensembl and dbSNP. But you also have the possibility to create your own datawraps by downloading the files independently and including them into the tar.gz archive, as explained here and here

That being said, pyGeno has been tested many times with both Ensembl and dbSNP, and we never suffered any problem due to the initial downloads.

Thanks

ADD REPLY • link updated 3.0 years ago by Ram 45k • written 10.2 years ago by Tariq Daouda ▴ 220

0

Entering edit mode

Thanks for the fix. I like the concepts behind this pacakge and want to test it out in practice. More feedback to follow.

ADD REPLY • link 10.2 years ago by Istvan Albert 102k

0

Entering edit mode

Thank you, your feedback is greatly appreciated.

ADD REPLY • link 10.2 years ago by Tariq Daouda ▴ 220

Ram · Answer 1 · 2017-08-12

Hi Tariq,

I have an quession on importing genome data in PyGeno. Since, The human reference sequence data was downloaded locally in HPC. The manifest.ini file was modified as following. It report a dug saying "sqlite3.OperationalError: disk I/O error",when I import the genome. However, the free disk space is enough in the HPC. Would you tell me how to fix such issue?

The platform I used is Python-2.7.13/PyGeno1.3.1 CentOS Linux release 7.3.1611 (Core)

Thank you very much.

Hao

manifest.ini

[package_infos]
description = Human reference genome
maintainer = Tariq Daouda
maintainer_contact = tariq.daouda@umontreal.ca
version = 1

[genome]
species = human
name = GRCh37.75
source = http://useast.ensembl.org/info/data/ftp/index.html

[chromosome_files]
10 = Homo_sapiens.GRCh37.75.dna.chromosome.10.fa.gz
11 = Homo_sapiens.GRCh37.75.dna.chromosome.11.fa.gz
12 = Homo_sapiens.GRCh37.75.dna.chromosome.12.fa.gz
13 = Homo_sapiens.GRCh37.75.dna.chromosome.13.fa.gz
14 = Homo_sapiens.GRCh37.75.dna.chromosome.14.fa.gz
15 = Homo_sapiens.GRCh37.75.dna.chromosome.15.fa.gz
16 = Homo_sapiens.GRCh37.75.dna.chromosome.16.fa.gz
17 = Homo_sapiens.GRCh37.75.dna.chromosome.17.fa.gz
18 = Homo_sapiens.GRCh37.75.dna.chromosome.18.fa.gz
19 = Homo_sapiens.GRCh37.75.dna.chromosome.19.fa.gz
1 = Homo_sapiens.GRCh37.75.dna.chromosome.1.fa.gz
20 = Homo_sapiens.GRCh37.75.dna.chromosome.20.fa.gz
21 = Homo_sapiens.GRCh37.75.dna.chromosome.21.fa.gz
22 = Homo_sapiens.GRCh37.75.dna.chromosome.22.fa.gz
2 = Homo_sapiens.GRCh37.75.dna.chromosome.2.fa.gz
3 = Homo_sapiens.GRCh37.75.dna.chromosome.3.fa.gz
4 = Homo_sapiens.GRCh37.75.dna.chromosome.4.fa.gz
5 = Homo_sapiens.GRCh37.75.dna.chromosome.5.fa.gz
6 = Homo_sapiens.GRCh37.75.dna.chromosome.6.fa.gz
7 = Homo_sapiens.GRCh37.75.dna.chromosome.7.fa.gz
8 = Homo_sapiens.GRCh37.75.dna.chromosome.8.fa.gz
9 = Homo_sapiens.GRCh37.75.dna.chromosome.9.fa.gz
MT = Homo_sapiens.GRCh37.75.dna.chromosome.MT.fa.gz
X = Homo_sapiens.GRCh37.75.dna.chromosome.X.fa.gz
Y = Homo_sapiens.GRCh37.75.dna.chromosome.Y.fa.gz

[gene_set]
gtf = Homo_sapiens.GRCh37.75.gtf.gz

bug

>>> import pyGeno.bootstrap as B
>>> B.importGenome("Human.GRCh37.75/")
Importing genome package: /home/yeh/program/Python-2.7.13/lib/python2.7/site-pac                                          kages/pyGeno/bootstrap_data/genomes/Human.GRCh37.75/... (This may take a while)
Importing:
        description:  Human reference genome
        maintainer:  Tariq Daouda
        maintainer_contact:  tariq.daouda@umontreal.ca
        version:  1
Genome:
        species:  human
        name:  GRCh37.75
        source:  http://useast.ensembl.org/info/data/ftp/index.html
...
Importing gene set infos from /home/yeh/program/Python-2.7.13/lib/python2.7/site                                          -packages/pyGeno/bootstrap_data/genomes/Human.GRCh37.75/Homo_sapiens.GRCh37.75.g                                          tf.gz...
Backuping indexes...
Droping all your indexes, (don't worry i'll restore them later)...
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/yeh/program/Python-2.7.13/lib/python2.7/site-packages/pyGeno/bootstrap.py", line 105, in importGenome
    PG.importGenome(path, batchSize)
  File "/home/yeh/program/Python-2.7.13/lib/python2.7/site-packages/pyGeno/importation/Genomes.py", line 179, in importGenome
    chros = _importGenomeObjects(gtfFile, chromosomeSet, genome, batchSize, verbose)
  File "/home/yeh/program/Python-2.7.13/lib/python2.7/site-packages/pyGeno/importation/Genomes.py", line 257, in _importGenomeObjects
    Transcript_Raba.flushIndexes()
  File "build/bdist.linux-x86_64/egg/rabaDB/Raba.py", line 547, in flushIndexes
  File "build/bdist.linux-x86_64/egg/rabaDB/rabaSetup.py", line 148, in dropIndexByName
  File "build/bdist.linux-x86_64/egg/rabaDB/rabaSetup.py", line 224, in execute
sqlite3.OperationalError: disk I/O error