Help needed, I was trying to extract methylation ratios from a "bam", which was generated by bisulphite sequence mapping with BSMAP. I am getting "ZeroDivisionError" and been wondering what might be happening. This is the command I issued:
python methratio.py --ref=rn5.fa --out=brain_control_1.txt --context=CG --pair --zero-meth brain_control_1.bam
And this was the message from std output:
[methratio] @Mon Sep 8 17:12:52 2014 loading reference file: rn5.fa ...
[methratio] @Mon Sep 8 17:16:17 2014 reading brain_control_1.bam ...
[methratio] @Mon Sep 8 17:17:19 2014 read 10000000 lines [
methratio] @Mon Sep 8 17:17:34 2014 read 12309445 lines
[methratio] @Mon Sep 8 17:17:34 2014 writing brain_control_1.txt ...
Traceback (most recent call last): File "methratio.py", line 253, in disp('total %d valid mappings, %d covered cytosines, average coverage: %.2f fold.' % (nmap, nc, float(nd)/nc)) ZeroDivisionError: float division by zero
This is a snippet of the bam file:
SRR955121-NP070.24669651 0 chr1 32402 255 36M * 0 0 TCTGCTCGAAAAAGTTAGGTGTGAGGAGCTGTCCTC ____K_aaaa^f^ff_\\_Zfffacaddda\VY_^B NM:i:1 ZS:Z:++ SRR955121-NP070.2905526 0 chr1 32406 255 36M * 0 0 CTCGAAAGAGTTAGGTGTGAGGAGCTGTCCTCACAT hhhhhhhhhhhhhhhghhghhhghhhgghhhhhhhh NM:i:0 ZS:Z:++ SRR955121-NP070.94067 16 chr1 32538 255 36M * 0 0 CTTCCCTACTTAAAGGGACTCGTACGCTTTGGTATG _hcgefhhfaghhhhhhhhhhhhhhhehhhhhfhhh NM:i:0 ZS:Z:-+ SRR955121-NP070.23202665 16 chr1 32549 255 36M * 0 0 AAAGGGACTCGTACGCTTTGGTATGTCATCTGGCAT hhhhhhffadfchhgehhhhhhhfhhghhhhhhhhh NM:i:0 ZS:Z:-+ SRR955121-NP070.19539704 16 chr1 32566 255 36M * 0 0 TTGGTATGTCATCTGGCATAATTGGCAGCGACCACT fhghhhhhhhfhhghhhhhhhhhhhhhhhhhhhhhh NM:i:0 ZS:Z:-+ SRR955121-NP070.1743333 16 chr1 32568 255 36M * 0 0 GGTATGTCATCTGGCATAATTGGCAGCGACCACTCA gffdfRfggggfaffagdfggddggfffaffffecc NM:i:0 ZS:Z:-+ SRR955121-NP070.6873790 16 chr1 32568 255 36M * 0 0 GGTATGTCATCTGGCATAATTGGCAGCGACCACTCA hhhhhhhhffhfhhhhhhhhhhhhhhhhhhhhhhhh NM:i:0 ZS:Z:-+ SRR955121-NP070.16344226 16 chr1 32572 255 36M * 0 0 TGTCATCTGGCATAATTGGCAGCGACCACTCAGAGA ghhhhhfffffQhhhhhhhhhhhhhhhhhhhhhhhh NM:i:0 ZS:Z:-+ SRR955121-NP070.14765321 16 chr1 32589 255 36M * 0 0 GGCAGCGACCACTCAGAGAGTGACACGTCCTAGGTG hghhhhhghh_hghhhhhhhhhhhhhhhhhhhhhhh NM:i:0 ZS:Z:-+ SRR955121-NP070.3490702 16 chr1 32598 255 36M * 0 0 CACTCAGAGAGTGACACGTCCTAGGTGAGGATAGTT hghhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh NM:i:0 ZS:Z:-+ SRR955121-NP070.7817525 16 chr1 32619 255 36M * 0 0 TAGGTGAGGATAGTTTCCTATATAAGGGATGGTTAT fhhQhhghhhhhhhhhhhhhhhhhhhhhhhhhhhhh NM:i:0 ZS:Z:-+ SRR955121-NP070.16151845 16 chr1 32619 255 36M * 0 0 TAGGTGAGGATAGTTTCCTATATAAGGGATGGTTAT c[cccca_ca\aaaaaac\_c]_ZYIccacacaYaa NM:i:0 ZS:Z:-+ SRR955121-NP070.5635950 16 chr1 32621 255 36M * 0 0 GGTGAGGATAGTTTCCTATATAAGGGATGGTTATTC hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh NM:i:0 ZS:Z:-+ SRR955121-NP070.12453704 16 chr1 32738 255 36M * 0 0 GTGCCGCATCGTTCCTGCTGGCGAGATGTAGCGTGG hhhhhhghchgfghhhhhhhhhhhhhhhhhhhhhhh NM:i:0 ZS:Z:-+ SRR955121-NP070.25129342 16 chr1 32738 255 36M * 0 0 GTGCCGCGTCGTTCCTGCTGGCGAGATGTAGCGTGG cgggggffffagagggcgfgcgggggdgfgfggfgg NM:i:0 ZS:Z:-+ SRR955121-NP070.2405969 16 chr1 32739 255 36M * 0 0 TGCCGCATCGTTCCTGCTGGCGAGATGTAGCGTGGG hhhhhhhhhhhhhhhghhghhhhhhhghhhhhghhh NM:i:0 ZS:Z:-+
I will appreciate it very much if you could help me out here. My project is stuck because of this problem. I have sent an email to the authors of BSMAP and still waiting for a response.
Thanks!
Find that line and comment it out. Put a # in front of it.
:) I like that solution, but I think it's actually telling him that the program is doing nothing useful.
I know nothing about the particular software, but the error message is giving you a clue. When the script trys to divide nd by nc you get a ZeroDivisionError (i.e. nc, which appears to be the number of covered cytosines, is 0). If this is just a "progress" message you could comment it out an skip it as Istvan suggests, but be ware it may indicate trouble with your bam or reference file.
Thanks, Matt! What's the best way out?
I believe that contacting the authors would be best.
But, while I await response from the authors, is there an alternate way of extracting the methylation ratios/frequencies from the bam file? Thanks!
(See my answer below but I think this should have been a question on its own...)
This happens whenever a chromosome has no coverage. The easiest solution is to just change the code to check
nc>0
before printing.BTW, if you were asking me about this on SEQanswers and I forgot to reply then sorry about that. It's easy to lose track of threads there.
Thanks, Devon, I will give it a try and get back to you.
So I modified the code to print only when "nc >0". The script ran all right but the output file was blank, save the headers. What this is telling me is that there was no coverage in the ENTIRE genome, not just particular chromosome(s)! Is this normal?
That'd be pretty unusual. Can you confirm that the chromosomes names in the fasta file are identical to those in the BAM header (just
samtools view -H brain_control_1.bam
)? The most likely cause of this is aligning against an Ensembl reference and then using a UCSC reference in this step (or vice versa).I've checked. The bam file header has apprx 2739 chromosome entries. I used the same fasta file (downloaded from UCSC), which was used for the mapping.
Very strange. Can you post the BAM file somewhere (google drive, drop box, etc.)? That'd make it easier to just track a couple reads and see where the problem is occurring.
Thanks! Here is it:
https://www.dropbox.com/s/cs1q26hpx3zqw10/brain_control_1.bam?dl=0