GC content in bilsulfite converted library
1
1
Entering edit mode
9.9 years ago
PoGibas 5.1k

I am confused about bisulfite converted library GC content.

Fastqc per base sequence content looks like this:

< image not found >

(%G has decreased and %A has increased, compared to reference genome).

  1. Shouldn't %C have decreased instead of %G?

    Bismark reports that >90% C's in CHG and CHH were methylated, however people from the wet lab say that in this organism only CpG methylation is possible.

  2. Seeing such result (methylation in CHH and CHG) can we speculate that something went bad with bisulfite conversion?

  3. Bismark/BS Seeker2 maps only those reads that have non-converted Cs (this is way we get high CH methylation percentage). What can be the reason that reads with converted Cs don't map?

bisulfite • 4.5k views
ADD COMMENT
3
Entering edit mode

It looks as if reads have been (reverse) complemented.

ADD REPLY
0
Entering edit mode

This is what we think too. If this is the data we got (Ion torrent) is it possible that something got messed up in base calling stage?

ADD REPLY
0
Entering edit mode

I have no experience with Ion Torrent but I don't see why base calling should complemented. Are you sure this is a "standard" bisulfite library?

ADD REPLY
4
Entering edit mode
9.9 years ago

That's really really strange. In all of my datasets the C percentage falls toward 0, causing T to jump to near 50%. Is this some sort of targeted BS-seq dataset? Did you run other samples at the same time and did they produce similar results?

In general, the C and T percentages should be pushed away from 25% by the bisulfite conversion and G and A percentages should still be around 25%. Not seeing that (and also seeing >90% CHH methylation when that's not expected) suggests pretty strongly to me that something either went very wrong during bisulfite conversion or the reads were treated in a very strange way prior to running fastQC. If you can confirm that no one monkeyed with the reads then I would suggest being very hesitant in trusting this dataset.

ADD COMMENT
0
Entering edit mode

Under ideal circumstances, are the lines usually flat or zig-zag? In my case, the C is close to 0 and T higher up at around 50 but they are not completely flat lines as shown in the examples here. Also at position 1, the C is close to 60%. Is there anything to suspect that it may be targeted sequencing/RRBS?

ADD REPLY
1
Entering edit mode

It's not unusual for them to jump around a bit. That C is closer to 60% at position 1 isn't uncommon, it's an artefact of the library prep, which had to fill in that base. You should exclude it during methylation extraction.

ADD REPLY

Login before adding your answer.

Traffic: 932 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6