RSeQC: Problems calling geneBody_coverage.py on a directory
1
0
Entering edit mode
10.1 years ago
fr2259 • 0

I'd like to use the geneBody_coverage.py script in RseQC to analyze some bam files. Because I have a lot of bam files, I'd like to be able to do this in a batch. The geneBody_coverage documentation states that the script can be called on an entire directory containing bam files and will generate a single graph showing all gene body coverage curves. I have tested the script on an individual bam file as:

$ geneBody_coverage.py -r /Users/fr2259/Py_lib/hg19.bed -i accepted_hits_0.bam -o output

This ran with no errors and produced the appropriate graph.

When I try to run the script on a directory containing all the bam files (and nothing but bam files), as directed by the documentation, I run into an error. Here's the command:

$ geneBody_coverage.py -r /Users/fr2259/Py_lib/hg19.bed -i /Users/fr2259/SeqData/bamFiles/ -o output

And here's the error:

Traceback (most recent call last):
  File "/Users/fr2259/Py_lib/RSeQC-2.3.9/scripts/geneBody_coverage.py", line 86, in <module>
    main()
  File "/Users/fr2259/Py_lib/RSeQC-2.3.9/scripts/geneBody_coverage.py", line 70, in main
    obj = SAM.ParseBAM(options.input_file)
  File "/Library/Python/2.7/site-packages/RSeQC-2.3.9-py2.7-macosx-10.9-intel.egg/qcmodule/SAM.py", line 2320, in __init__
    self.samfile = pysam.Samfile(inputFile,'r')
  File "csamtools.pyx", line 597, in pysam.csamtools.Samfile.__cinit__ (pysam/csamtools.c:6532)
  File "csamtools.pyx", line 760, in pysam.csamtools.Samfile._open (pysam/csamtools.c:8303)
ValueError: file header is empty (mode='r') - is it SAM/BAM format?

In case this was related to one file having a problem with its header, I called the geneBody_coverage.py script as part of a for loop. No errors came up, but each resulting gene body coverage graph was overwritten by the subsequent one. This is likely a problem of me being a novice at running commands from the terminal.

Why would the same script work perfectly fine when given an individual file, but throw an error that suggests that there is a problem with the input file formats when called to operate on a batch?

Thanks

rna-seq • 6.2k views
ADD COMMENT
1
Entering edit mode
10.1 years ago
Coryza ▴ 430

You could try to email the author of the script; liguow@bcm.edu.

Edit: Have you tried this?

  • get a clean install of python 2.7
  • install cython by: easy_install cython
  • download pysam-0.7 from https://code.google.com/p/pysam/downloads/detail?name=pysam-0.7.tar.gz&can=2&q=
  • unpack pysam
  • Patch csamtools.pyx like described here: https://code.google.com/p/pysam/issues/detail?id=108
  • run python setup.py install in the pysam directory. This will install a version of pysam that can read ion torrent files.
  • Install RSeQC 2.3.4.
  • Go to the python site-packages directory of RSeQC and delete all modules that are from the pysam install of RSeQC so RSeQC uses the psam version we installed into the python installation. (just delete the files that are also present in your python dist-packages directory)

https://code.google.com/p/rseqc/issues/detail?id=17

ADD COMMENT
0
Entering edit mode

Thanks for the reply. I will actually shoot the author an email.

I think that the problem is related to the version of RSeQC that I was using (2.4). There seem to be problems associated with calling the script on multiple files. Maybe a bug related to running it on my OS (OSX 10.9.4)? It works just fine with a single files, so I'll see what I can do with that. =P

ADD REPLY

Login before adding your answer.

Traffic: 1729 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6