I'd like to use the geneBody_coverage.py
script in RseQC to analyze some bam files. Because I have a lot of bam files, I'd like to be able to do this in a batch. The geneBody_coverage
documentation states that the script can be called on an entire directory containing bam files and will generate a single graph showing all gene body coverage curves. I have tested the script on an individual bam file as:
$ geneBody_coverage.py -r /Users/fr2259/Py_lib/hg19.bed -i accepted_hits_0.bam -o output
This ran with no errors and produced the appropriate graph.
When I try to run the script on a directory containing all the bam files (and nothing but bam files), as directed by the documentation, I run into an error. Here's the command:
$ geneBody_coverage.py -r /Users/fr2259/Py_lib/hg19.bed -i /Users/fr2259/SeqData/bamFiles/ -o output
And here's the error:
Traceback (most recent call last):
File "/Users/fr2259/Py_lib/RSeQC-2.3.9/scripts/geneBody_coverage.py", line 86, in <module>
main()
File "/Users/fr2259/Py_lib/RSeQC-2.3.9/scripts/geneBody_coverage.py", line 70, in main
obj = SAM.ParseBAM(options.input_file)
File "/Library/Python/2.7/site-packages/RSeQC-2.3.9-py2.7-macosx-10.9-intel.egg/qcmodule/SAM.py", line 2320, in __init__
self.samfile = pysam.Samfile(inputFile,'r')
File "csamtools.pyx", line 597, in pysam.csamtools.Samfile.__cinit__ (pysam/csamtools.c:6532)
File "csamtools.pyx", line 760, in pysam.csamtools.Samfile._open (pysam/csamtools.c:8303)
ValueError: file header is empty (mode='r') - is it SAM/BAM format?
In case this was related to one file having a problem with its header, I called the geneBody_coverage.py
script as part of a for loop. No errors came up, but each resulting gene body coverage graph was overwritten by the subsequent one. This is likely a problem of me being a novice at running commands from the terminal.
Why would the same script work perfectly fine when given an individual file, but throw an error that suggests that there is a problem with the input file formats when called to operate on a batch?
Thanks
Thanks for the reply. I will actually shoot the author an email.
I think that the problem is related to the version of RSeQC that I was using (2.4). There seem to be problems associated with calling the script on multiple files. Maybe a bug related to running it on my OS (OSX 10.9.4)? It works just fine with a single files, so I'll see what I can do with that. =P