Question

Inconsistent Methylation Levels Between Pairs in WGBS Data After Moving to NovaSeq X

2

Entering edit mode

5 weeks ago

Michal ▴ 20

Hi everyone,

I'm analyzing some WGBS data and have noticed a discrepancy in methylation levels between sequencing pairs since moving to the new NovaSeq X.

Previously, methylation levels were consistent between pair one and pair two at around 75%. However, with the NovaSeq X, pair two now shows only 50% methylation, while pair one remains at approximately 75%.

When examining the data in IGV, the alignment quality is high in non-CpG bases for both pairs. However, there’s a frequent issue with CpG bases where: Pair one consistently registers a C (indicating a methylated base) with a high quality score. Pair two, on the other hand, frequently calls a T at the same CpG site with a much lower quality score.

This suggests there may be a readout issue with the "dark base" G, which seems underrepresented in pair two. Could this be related to a compensation issue or an incorrect background level setting for the dark base in pair two on the NovaSeq X?

Has anyone encountered similar issues with the NovaSeq X, or does anyone have advice on troubleshooting or potential adjustments?

Thanks for any insights!

X WGBS NovaSeq • 716 views

ADD COMMENT • link updated 5 weeks ago by GenoMax 148k • written 5 weeks ago by Michal ▴ 20

3

Entering edit mode

Take a look at New Illumina error mode, new BBTools release (39.09) to deal with it This is a long thread and look through it completely.

Your data likely suffers from the dreaded poly-G issue that seems to specifically affect NovaSeq X. See if pre-processing your data with this tool is able to address the issue.

ADD REPLY • link 5 weeks ago by GenoMax 148k

0

Entering edit mode

Thanks! This feels I have exactly opposite problem. Real Gs overcompansated and called as Ts. Would that make sense?

ADD REPLY • link 5 weeks ago by Michal ▴ 20

0

Entering edit mode

You may want to see how your data reacts to polyfilter.sh. If your data does suffer from the poly-Nucleotide issue then it is possible that aligner may have different alignments for certain reads after they are cleaned up changing the overall result you see.

As discussed in other thread, while there may be a lot of NovaSeqX data out there poly-nuc call issue seems to be under-reported or under-appreciated. If your data does not have this issue then let us know.

ADD REPLY • link 5 weeks ago by GenoMax 148k

0

Entering edit mode

We have exactly opposite problem. Too few Gs ;). The alignments are very good. Just where should be a G, is often T with low quality.

ADD REPLY • link 5 weeks ago by Michal ▴ 20

0

Entering edit mode

polyfilter.sh can take care of other poly-nuc stretches as well (poly-T which may be unreal, use polymers=GCAT).

ADD REPLY • link 5 weeks ago by GenoMax 148k

1

Entering edit mode

Is this only happening on a single run or has this really affected every sample?

ADD REPLY • link 5 weeks ago by Devon Ryan 105k

0

Entering edit mode

This happened on multiple runs. One run was ok (which was underloaded).

ADD REPLY • link 5 weeks ago by Michal ▴ 20

2

Entering edit mode

which was underloaded

That may be an interesting clue. I wonder if the basecaller is having a problem with borderline overloaded samples which is causing the observation you see.

ADD REPLY • link 5 weeks ago by GenoMax 148k

0

Entering edit mode

That sounds like something. How would I detect it? (sorry for my lack of knowledge)

ADD REPLY • link 5 weeks ago by Michal ▴ 20

0

Entering edit mode

You would need to involve the sequence provider. They will have to get Illumina tech support to have a look at the affected runs (this generally can be done remotely or otherwise they will need to send some diagnostic files to Illumina support). Tell them the specific issue and see if they have any pointers. You may need to be patient and persistent since this all needs time investment on part of the provider.

ADD REPLY • link 5 weeks ago by GenoMax 148k