Using PhiX to estimate bisulfite conversion rate
1
4
Entering edit mode
9.4 years ago
igor 13k

I was trying to use PhiX spike-in to estimate bisulfite conversion rate. When I processed my reads with Bismark aligning to the mouse genome (this is a mouse sample), the results seem reasonable:

C methylated in CpG context:    52.6%
C methylated in CHG context:    4.5%
C methylated in CHH context:    4.5%

Then I align the same sample against the PhiX genome. The alignment rate is less than 1% as expected, but the methylation rates are odd:

C methylated in CpG context:    98.0%
C methylated in CHG context:    97.9%
C methylated in CHH context:    97.8%

Why is it the opposite of what it should be? PhiX should be unmethylated. The output is generated by Bismark, so it's not possible that I used a wrong formula and the bisulfite conversion worked at least partially as demonstrated by the sample of interest. What am I missing here?

WGBS Bismark RRBS • 4.9k views
ADD COMMENT
1
Entering edit mode

That is extremely odd. Do you getting a methylation_ratio file for the PhiX genome? If so, look directly at the values to see if there's an error in the calculation.

Also, where in the pipeline did you add the PhiX and are you sure it is an unmethylated variety (both may be sold)?

ADD REPLY
2
Entering edit mode
9.4 years ago

PhiX can be added to the Illumina sequencing at different steps

  1. BEFORE the bisulfite treatment as a way to estimate the C to U conversion
  2. AFTER the bisulfite conversion as a way to balance the DNA composition, otherwise Illumina base calling will not work properly. In this latter case, and depending upon the software version, you need to add from 10 to 50% of PhiX DNA to your samples to balance the base composition (or you have the choice to have a separated line with only PhiX DNA as control). More information HERE and HERE

So if you have not done the sequencing yourself, it is likely that the PhiX sequence you are analyzing have been added after the bisulfite treatment. This is mandatory

ADD COMMENT
1
Entering edit mode

Seems like a good explanation. However, I would say that 4.5% methylation in non CpG context is a little too high, we usually see <1%, but it might be expected in this case of course. In contrast, if PhiX has been added later, ~98% C seems too low, I would expect >99%. Maybe worth checking the sequence quality?

ADD REPLY
0
Entering edit mode

I agree. In my hands, sequence quality plays a major role in the analysis of the GC content. It will not be the first time that a peak or valley with a different GC content analyzed with FastQC disappear after trimming the sequences for quality

My question.. 98% could be fair for what you expect of a sequencing platform that is nice, but still far to be perfect ?. Or you still is confident that you must get that >99% ?.

ADD REPLY
0
Entering edit mode

"98% could be fair for what you expect of a sequencing platform that is nice, but still far to be perfect" It's not unusual these days to have reads with quality consistently above Q30, especially on MiSeq (I'm talking about Illumina platforms here), which translates in error rate 0.1% or C 99.9%. A fastQC report would helpful for the OP...

ADD REPLY
0
Entering edit mode

The quality is great. That is not the issue.

ADD REPLY
0
Entering edit mode

This was added before bisulfite treatment specifically to measure bisulfite conversion rates.

The PhiX added during sequencing does not get indexed, so it would disappear after demultiplexing.

ADD REPLY
0
Entering edit mode

It will be hard to find a logical explanation, though..

ADD REPLY
0
Entering edit mode

I think the indexing of PhiX is not really an issue. One can align to PhiX the reads that failed to demultiplex, which would be a mixture of junk and real PhiX.

ADD REPLY

Login before adding your answer.

Traffic: 2123 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6