how to obtain subreads from *.bax.h5 files
0
0
Entering edit mode
4.1 years ago
wes ▴ 90

I had downloaded RSII data from NCBI which consists of three *.bax.h5.1 files and one bas.h5.1 file.

Next, I would like to obtain subreads file using pbh5tools.

By referring to guideline below, the input is bas.h5 file but there is error. May I know if the cmd below is correct? or should I use baxh5.1 as input file which I tried as well (shown below) but error too.

http://lira.no-ip.org:8080/doc/python-pbh5tools/html/#installation

bash5tools.py --outFilePrefix m150803_212153_42216_c100858342550000001823192601241650_s1_p0.1.subreads --readType subreads --outType fastq m150803_212153_42216_c100858342550000001823192601241650_s1_p0.bas.h5.1

Traceback (most recent call last):
File "/home/cbr01/anaconda3/envs/py2/bin/bash5tools.py", line 166, in
sys.exit(BasH5ToolsRunner().start())
File "/home/cbr01/anaconda3/envs/py2/lib/python2.7/site-packages/pbcore/util/ToolRunner.py", line 85, in start
return self.run()
File "/home/cbr01/anaconda3/envs/py2/bin/bash5tools.py", line 121, in run
inBasH5 = BasH5Reader(self.args.inFile)
File "/home/cbr01/anaconda3/envs/py2/lib/python2.7/site-packages/pbcore/io/BasH5IO.py", line 660, in init
for fn in self.file["/MultiPart/Parts"] ]
File "/home/cbr01/anaconda3/envs/py2/lib/python2.7/site-packages/pbcore/io/BasH5IO.py", line 304, in init
raise IOError("Invalid or nonexistent bax/bas file %s" % filename)
IOError: Invalid or nonexistent bax/bas file /media/cbr01/analysis/WEE/DNASeq/SRX2718652/m150803_212153_42216_c100858342550000001823192601241650_s1_p0.1.bax.h5

bash5tools.py --outFilePrefix m150803_212153_42216_c100858342550000001823192601241650_s1_p0.1.subreads --readType subreads --outType fastq m150803_212153_42216_c100858342550000001823192601241650_s1_p0.1.bax.h5.1
/home/cbr01/anaconda3/envs/py2/lib/python2.7/site-packages/pbcore/io/BasH5IO.py:273: H5pyDeprecationWarning: dataset.value has been deprecated. Use dataset[()] instead.
numEvent = h5Group["ZMW/NumEvent"].value
/home/cbr01/anaconda3/envs/py2/lib/python2.7/site-packages/pbcore/io/BasH5IO.py:274: H5pyDeprecationWarning: dataset.value has been deprecated. Use dataset[()] instead.
holeNumber = h5Group["ZMW/HoleNumber"].value
/home/cbr01/anaconda3/envs/py2/lib/python2.7/site-packages/pbcore/io/BasH5IO.py:349: H5pyDeprecationWarning: dataset.value has been deprecated. Use dataset[()] instead.
holeNumbers = self._mainBasecallsGroup["ZMW/HoleNumber"].value
/home/cbr01/anaconda3/envs/py2/lib/python2.7/site-packages/pbcore/io/BasH5IO.py:356: H5pyDeprecationWarning: dataset.value has been deprecated. Use dataset[()] instead.
fh["/PulseData/Regions"].value)
/home/cbr01/anaconda3/envs/py2/lib/python2.7/site-packages/pbcore/io/BasH5IO.py:379: H5pyDeprecationWarning: dataset.value has been deprecated. Use dataset[()] instead.
holeStatus = self._mainBasecallsGroup["ZMW/HoleStatus"].value
/home/cbr01/anaconda3/envs/py2/lib/python2.7/site-packages/pbcore/io/BasH5IO.py:388: H5pyDeprecationWarning: dataset.value has been deprecated. Use dataset[()] instead.
(self._mainBasecallsGroup["ZMW/NumEvent"].value > 0) &
/home/cbr01/anaconda3/envs/py2/lib/python2.7/site-packages/pbcore/io/BasH5IO.py:565: H5pyDeprecationWarning: dataset.value has been deprecated. Use dataset[()] instead.
self.__metricCache[name] = self._mainBasecallsGroup[k].value
genome • 1.2k views
ADD COMMENT
0
Entering edit mode
usage: bash5tools.py [-h] [--verbose] [--version] [--profile] [--debug]
                     [--outFilePrefix OUTFILEPREFIX]
                     [--readType {ccs,subreads,unrolled}] [--outType OUTTYPE]
                     [--minLength MINLENGTH] [--minReadScore MINREADSCORE]
                     [--minPasses MINPASSES]
                     input.bas.h5

Tool for extracting data from .bas.h5 files

positional arguments:
  input.bas.h5          input .bas.h5 filename

You need to use the bas file.

IOError: Invalid or nonexistent bax/bas file /media/cbr01/analysis/WEE/DNASeq/SRX2718652/m150803_212153_42216_c100858342550000001823192601241650_s1_p0.1.bax.h5

There is also an error about the bax file. Are these files in the same directory? Did you download them as binary files?

ADD REPLY
0
Entering edit mode

Dear Genomax

I download the original format of PacBio data- three bax.h5.1 and bas.h5.1 file directly from this link under data access https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR5428623 May I know how to download as binary file?

Yes, they are under the same directory (base) cbr01@cbr01-Precision-T7600:/media/cbr01/analysis/WEE/DNASeq/SRX2718652$ ls m150803_212153_42216_c100858342550000001823192601241650_s1_p0.1.bax.h5.1 m150803_212153_42216_c100858342550000001823192601241650_s1_p0.2.bax.h5.1 m150803_212153_42216_c100858342550000001823192601241650_s1_p0.3.bax.h5.1 m150803_212153_42216_c100858342550000001823192601241650_s1_p0.bas.h5.1 m150803_212153_42216_c100858342550000001823192601241650_s1_p0.metadata.xml

ADD REPLY

Login before adding your answer.

Traffic: 2000 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6