Error 1: NULL byte detected when trying to segment by PSCBS
1
0
Entering edit mode
7.8 years ago
nbay13 • 0

I recently installed cnvkit locally on to a computing cluster by cloning it from github (0.8.6dev0), building, and installing it. I then pip installed all of the dependencies and installed the necessary packages in R. I then tried to test the installation as explained on github, but failed at the segment call with the following error:

[nbay13@n7157 test]$ make
python ../cnvkit.py segment -p 2 --drop-low-coverage -t .01 build/p2-5_5.cnr -o build/p2-5_5.cns
Traceback (most recent call last):
  File "../cnvkit.py", line 13, in <module>
    args.func(args)
  File "/u/home/n/nbay13/cnvkit/cnvlib/commands.py", line 597, in _cmd_segment
    processes=args.processes)
  File "/u/home/n/nbay13/cnvkit/cnvlib/segmentation/__init__.py", line 42, in do_segmentation
    for _, ca in cnarr.by_chromosome())))
  File "/u/home/n/nbay13/.local/lib/python2.7/site-packages/concurrent/futures/_base.py", line 581, in result_iterator
    yield future.result()
  File "/u/home/n/nbay13/.local/lib/python2.7/site-packages/concurrent/futures/_base.py", line 405, in result
    return self.__get_result()
  File "/u/home/n/nbay13/.local/lib/python2.7/site-packages/concurrent/futures/_base.py", line 357, in __get_result
    raise type(self._exception), self._exception, self._traceback
ValueError: Unexpected dataframe contents:
NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead
"p2-5_5"        "chr1"  142517943       249066405       671     0.00824754582746605

I have similarly tried to segment my own data without multiprocessing to no avail.

Switching the method to haar works, but both PSCBS and Lasso do not. So it seems that the problem might have to do with R (my version currently is 3.3.0, but I've also tried 3.2.3). If possible I would like to use PSCBS.

Any help would be much appreciated.

cnvkit copy number exome segment • 3.9k views
ADD COMMENT
0
Entering edit mode

The error:

NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead

comes from python's csv parser. Find the file it is trying to read and check it is properly formed then work from there to find the cause. See the answer to the same question here.

ADD REPLY
1
Entering edit mode
7.8 years ago
Eric T. ★ 2.8k

Thanks for reporting. Have you used any previous version of CNVkit successfully? The most recent stable version is 0.8.5, and installing that through conda (which pulls in its own copy of the R interpreter and PSCBS package) seems to work for most people. The development version is in flux, but the automated TravisCI testing shows PSCBS segmentation still works with the default conda installation, at least on Linux. Since the crash you saw is in the built-in test suite, that suggests it's due to some incompatibility in your R installation which introduces a null byte somewhere, or truncates the file or causes it to do I/O in text or binary mode incorrectly.

Could you try running the segmentation command without multiprocessing and post the traceback to CNVkit's issue tracker? Or, if you don't want to set up a GitHub account, posting it here is OK too.

ADD COMMENT
0
Entering edit mode

This is my first time trying CNVkit on this computing cluster, previously I had it working on a local computer with version 0.8.2 I believe. I'll try installing through conda. Here's the traceback with no multiprocessing for my sample data:

Traceback (most recent call last):
  File "/u/home/n/nbay13/cnvkit/cnvkit.py", line 13, in <module>
    args.func(args)
  File "/u/home/n/nbay13/cnvkit/cnvlib/commands.py", line 597, in _cmd_segment
    processes=args.processes)
  File "/u/home/n/nbay13/cnvkit/cnvlib/segmentation/__init__.py", line 32, in do_segmentation
    skip_outliers, save_dataframe, rlibpath)
  File "/u/home/n/nbay13/cnvkit/cnvlib/segmentation/__init__.py", line 117, in _do_segmentation
    sample_id=cnarr.sample_id)
  File "/u/home/n/nbay13/cnvkit/skgenome/tabio/__init__.py", line 78, in read
    dframe = reader(infile, **kwargs)
  File "/u/home/n/nbay13/cnvkit/skgenome/tabio/seg.py", line 68, in read_seg
    for sid, dframe in results:
  File "/u/home/n/nbay13/cnvkit/skgenome/tabio/seg.py", line 147, in parse_seg
    (err, next(handle)))
ValueError: Unexpected dataframe contents:
NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead
"dedup_reads_GS027_10mil"       "chr1"  143401632       248919830       16965   -0.14007255304696
ADD REPLY
0
Entering edit mode

Conda installation has fixed the problem (both PSCBS and multiprocessing work), although I wonder what was wrong. Thanks for the help.

ADD REPLY

Login before adding your answer.

Traffic: 2889 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6