Hi there,
So I have recieved some Bismark output files:
Bismark (SAM/BAM)
Mbias files & Plots
Methylation Extractor reports
The documentation is very good and I have a good understanding of all the files however in my data from the Methylation extractor there are several sites that are not marked with the chromosome header.
The report files according to methylationExractor should have the following headers:
(1) seq-ID (2) methylation state (3) chromosome (4) start position (= end position) (5) methylation call
In my files I have as follows:
HWI-ST571:431:C43FMACXX:8:2316:9928:82689_1:N:0:GTCCGC - chr9 23913539 z
HWI-ST571:431:C43FMACXX:8:2316:9928:82689_1:N:0:GTCCGC + chr9 23913531 Z
HWI-ST571:431:C43FMACXX:8:2316:9928:82689_1:N:0:GTCCGC + chr9 23913529 Z
HWI-ST571:431:C43FMACXX:8:2316:9928:82689_1:N:0:GTCCGC - chr9 23913485 z
HWI-ST571:431:C43FMACXX:8:2316:9928:85484_1:N:0:GTCCGC + chr2 28594224 Z
HWI-ST571:431:C43FMACXX:8:2316:9928:85484_1:N:0:GTCCGC - chr2 28594341 z
HWI-ST571:431:C43FMACXX:8:2316:9928:85757_1:N:0:GTCCGC - 30126 6594 z # <- this line
HWI-ST571:431:C43FMACXX:8:2316:9928:85757_1:N:0:GTCCGC - chr2 223724995 z
HWI-ST571:431:C43FMACXX:8:2316:9929:12206_1:N:0:GTCCGC - chr12 64474195 z
HWI-ST571:431:C43FMACXX:8:2316:9929:59562_1:N:0:GTCCGC + chr20 55837341 Z
HWI-ST571:431:C43FMACXX:8:2316:9929:59562_1:N:0:GTCCGC - chr20 55837591 z
HWI-ST571:431:C43FMACXX:8:2316:9929:59562_1:N:0:GTCCGC + chr20 55837557 Z
HWI-ST571:431:C43FMACXX:8:2316:9929:59562_1:N:0:GTCCGC + chr20 55837519 Z
HWI-ST571:431:C43FMACXX:8:2316:9929:89987_1:N:0:GTCCGC + chr5 92603043 Z
HWI-ST571:431:C43FMACXX:8:2316:9929:89987_1:N:0:GTCCGC + 26965 45121 Z #<- this line
HWI-ST571:431:C43FMACXX:8:2316:9929:89987_1:N:0:GTCCGC + 26965 301 Z #<- this line
HWI-ST571:431:C43FMACXX:8:2316:9929:89987_1:N:0:GTCCGC + chr5 92603089 Z
HWI-ST571:431:C43FMACXX:8:2316:9929:89987_1:N:0:GTCCGC + chr5 92603122 Z
I am not sure about this as they seem to be a strange format. Initially I though maybe there was a formatting issue and it skipped the "chr" but none of the numbers correspond to a CG site.
When I run bismark2bedgraph the job completes and these files are also listed but without the chromosomal location I cannot locate the methylation site.
Have any of you observed this before or know what has happened?
Cheers
That looks like a bug in the program. Felix, the author of bismark, isn't particularly active here but is on seqanswers. You might just post this as a bug report on the bismark thread over there. In the interim, if you convert your alignments to a BAM file and sort/index them, you can just use PileOMeth to directly make the bedGraph files.
For those coming across this later, there's a follow-up on SEQanswers.