Question

GC content in bilsulfite converted library

1

Entering edit mode

9.9 years ago

PoGibas 5.1k

I am confused about bisulfite converted library GC content.

Fastqc per base sequence content looks like this:

< image not found >

(%G has decreased and %A has increased, compared to reference genome).

Shouldn't %C have decreased instead of %G?

Bismark reports that >90% C's in CHG and CHH were methylated, however people from the wet lab say that in this organism only CpG methylation is possible.
Seeing such result (methylation in CHH and CHG) can we speculate that something went bad with bisulfite conversion?
Bismark/BS Seeker2 maps only those reads that have non-converted Cs (this is way we get high CH methylation percentage). What can be the reason that reads with converted Cs don't map?

bisulfite • 4.5k views

ADD COMMENT • link updated 2.7 years ago by Ram 44k • written 9.9 years ago by PoGibas 5.1k

3

Entering edit mode

It looks as if reads have been (reverse) complemented.

ADD REPLY • link 9.9 years ago by dariober 15k

0

Entering edit mode

This is what we think too. If this is the data we got (Ion torrent) is it possible that something got messed up in base calling stage?

ADD REPLY • link 9.9 years ago by PoGibas 5.1k

0

Entering edit mode

I have no experience with Ion Torrent but I don't see why base calling should complemented. Are you sure this is a "standard" bisulfite library?

ADD REPLY • link 9.9 years ago by dariober 15k

score 4 · Accepted Answer · 2014-11-25

4

Entering edit mode

9.9 years ago

Devon Ryan 104k

That's really really strange. In all of my datasets the C percentage falls toward 0, causing T to jump to near 50%. Is this some sort of targeted BS-seq dataset? Did you run other samples at the same time and did they produce similar results?

In general, the C and T percentages should be pushed away from 25% by the bisulfite conversion and G and A percentages should still be around 25%. Not seeing that (and also seeing >90% CHH methylation when that's not expected) suggests pretty strongly to me that something either went very wrong during bisulfite conversion or the reads were treated in a very strange way prior to running fastQC. If you can confirm that no one monkeyed with the reads then I would suggest being very hesitant in trusting this dataset.

ADD COMMENT • link 9.9 years ago by Devon Ryan 104k

0

Entering edit mode

Under ideal circumstances, are the lines usually flat or zig-zag? In my case, the C is close to 0 and T higher up at around 50 but they are not completely flat lines as shown in the examples here. Also at position 1, the C is close to 60%. Is there anything to suspect that it may be targeted sequencing/RRBS?

ADD REPLY • link 3.3 years ago by Arindam Ghosh ▴ 530

1

Entering edit mode

It's not unusual for them to jump around a bit. That C is closer to 60% at position 1 isn't uncommon, it's an artefact of the library prep, which had to fill in that base. You should exclude it during methylation extraction.

ADD REPLY • link 3.3 years ago by Devon Ryan 104k