Sequence Bias Bismark Output Interpretation
3
0
Entering edit mode
7.4 years ago
fusion.slope ▴ 250

Hello Community,

I have performed with Bismark the M-Bias Plot and my reads looks as follow.

CpG Bias

as far as I am understanding the end of read 1 is showing a decrease in the Methylation at the 3'end for CHG and CHH total call (orange and green colors respectively). The strange thing is the read 2 in which I can observe a drop Immediately after the first 5 nucleotides for all the type of of Methylation combination (CHH total, CHG total, CHH methylation etc..). How should I have to interpret this? To me it sounds like the read 2 look quite bad in therms of nucleotide composition, and only the CpG methylation bias is showing a good pattern. Does anyone has experienced a similar problem?

In the bismark tutorial they clearly show that Read1 and Read2 have similar pattern (see here at page 16 https://www.bioinformatics.babraham.ac.uk/projects/bismark/Bismark_User_Guide.pdf )

Any comment is really appreciated. Cheers

Methylation Sequence Bias CpG M-Bias Plot • 8.0k views
ADD COMMENT
0
Entering edit mode

Hi, We meet the same problem. Did you solve yours? Thank you!

ADD REPLY
0
Entering edit mode
6.4 years ago
TEman ▴ 10

Hi,

What about the gradual decrease in CHH total and CHG total in R2? I see the same gradual decrease over R2 in my dataset. Does anyone have an explanation for this?

ADD COMMENT
0
Entering edit mode

here the explanation about R2:

https://sequencing.qcfail.com/articles/library-end-repair-reaction-introduces-methylation-biases-in-paired-end-pe-bisulfite-seq-applications/

you can try to have a better performance using this command in Bismark:

bismark_methylation_extractor --ignore_r2 2 --gzip sample1_bismark_bt2_pe.bam

ADD REPLY
0
Entering edit mode
6.4 years ago
TEman ▴ 10

Thank you for your answer.

I get the bias at the very start (drop of methylated C). I have already trimmed the leading bases of R2 reads.

However, my concern is the gradual decrease in the CHH Total Calls and CHG Total Calls. Here is the QC from Bismark User Guide (https://rawgit.com/FelixKrueger/Bismark/master/Docs/Bismark_User_Guide.html):

R2_CHH total bias

I cannot find any discussion or explanation about this bias? Is it all due to trimming of low quality bases at the 3'end of R2? Looking at the read length distribution of my trimmed R2 reads, they do not follow this pattern, as the very most of them are of the max length..

ADD COMMENT
2
Entering edit mode

Hi TEman,

I came across a similar decrease in the total number of total Cs calls and I share your concern. I tried processing R2 as single end and the decrease disappeared..

I asked about it and turns out that this kind of pattern is present because the overlap between R1 and R2 is removed/ignored during methylation extraction to avoid scoring methylation twice in the same fragment (--no_overlap option). Only R1 is used at the overlap region. Try running the analysis with --include_overlap, the decrease should disappear. That being said, I think --no_overlap should always be used with PE reads as recommended.

ADD REPLY
0
Entering edit mode

Hi Kramdi, Thanks for the clarification. What I don't understand is why the CpG methylation seems normal for reads 2, if the drop of other methylations (CHH CHG) is due to the overlap and therefore not called.

ADD REPLY
0
Entering edit mode

if you read the link I sent you is exaplained (even though is not a clear explanation):

"The methylation state of the first couple of bases or Read 2 drops from the average ~70% down to ~3%, and this steep drop can certainly be explained by the filled-in unmethylated cytosines. Since there is no reason to believe that there would be a biological role for lower methylation at the start of reads we have to assume that this low-methylation is artefactual, and thus leaving this untreated will introduce, in this case several hundred thousand, incorrect methylation calls (and thus noise) into the dataset."

ADD REPLY
0
Entering edit mode

I am not sure if we understand each other. I have read that link.

Ignore the first 10 bases. Ignore the methylation. Just look at the CHH Total Calls line. It goes from ~4,500k at the beginning to ~1000-500k at the end. What happens in between? Why are there gradually fewer total CHH calls over the read?

ADD REPLY
0
Entering edit mode

Yes I got your point, and sorry if i did not get your full concern at the first time. What i can tell you from my experience is that in the read 2 for CHG or CHH there is always that kind of drop, why i do not know. Since I was mostly interested in the CpG (where the trend of the methylation values is stable along the R2), i did not try to go in deep into that problem.

ADD REPLY

Login before adding your answer.

Traffic: 1848 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6