Bismark rrbs data analysis
1
2
Entering edit mode
5.6 years ago

Hello! I'm quite new in methyl-seq data. And I've just completed a RRBS analysis on bismark. So I got some text file results but I'm not quite sure how to interpret these data. What is the best tool to make them visual?

Some of the text files are starting with CHG_CTOB, CHG_CTOT, CHG_OB etc, some of them starting with CHH and there are also bias report as well. I'm not quite sure which one should I use. Cheers!

rrbs methylation bismark methyl-seq • 4.0k views
ADD COMMENT
0
Entering edit mode

Isn't there an HTML report that bismark produces these days? Run multiqc at the very least if not.

ADD REPLY
5
Entering edit mode
5.6 years ago

It sounds like you need to familiarize yourself a bit more with the data and data formats for RRBS.

The QC protocol and a paper by the Bismark authors will help you understand the biases you should be looking out for and the Bismark website itself also offers a couple of sample HTML output files to which you could compare your results.

To understand what CHG* etc. stand for, you'll find plenty of information in the Bismark publication and the Bismark user guide, that includes, for example, the following paragraph:

Bisulfite treatment of DNA and subsequent PCR amplification can give rise to four (bisulfite converted) strands for a given locus. Depending on the adapters used, BS-Seq libraries can be constructed in two different ways:

  1. If a library is directional, only reads which are (bisulfite converted) versions of the original top strand (OT) or the original bottom strand (OB) will be sequenced. Even though the strands complementary to OT (CTOT) and OB (CTOB) are generated in the BS-PCR step they will not be sequenced as they carry the wrong kind of adapter at their 5’-end. By default, Bismark performs only 2 read alignments to the OT and OB strands, thereby ignoring alignments coming from the complementary strands as they should theoretically not be present in the BS-Seq library in question.
  2. Alternatively, BS-Seq libraries can be constructed so that all four different strands generated in the BS-PCR can and will end up in the sequencing library with roughly the same likelihood. In this case all four strands (OT, CTOT, OB, CTOB) can produce valid alignments and the library is called non- directional. Specifying --non_directional instructs Bismark to use all four alignment outputs.

To summarise again: alignments to the original top strand or to the strand complementary to the original top strand (OT and CTOT) will both yield methylation information for cytosines on the top strand. Alignments to the original bottom strand or to the strand complementary to the original bottom strand (OB and CTOB) will both yield methylation information for cytosines on the bottom strand, i.e. they will appear to yield methylation information for G positions on the top strand of the reference genome.

Once you're confident you understand the output of Bismark and the quality of your data, you may want to check out packages for downstream analyses, such as methylKit. Good overviews of methylation analyses pitfalls and tools can be found here and here and here.

ADD COMMENT
0
Entering edit mode

Are there any difference in how reads are aligned for WGBS and RRBS using bismark? Few points to consider include:

  • trimming 2 bases from the end of reads (trimgalore --rrbs)
  • not to use deduplication step
ADD REPLY

Login before adding your answer.

Traffic: 1944 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6