Hi everyone,
I'm analyzing some WGBS data and have noticed a discrepancy in methylation levels between sequencing pairs since moving to the new NovaSeq X.
Previously, methylation levels were consistent between pair one and pair two at around 75%. However, with the NovaSeq X, pair two now shows only 50% methylation, while pair one remains at approximately 75%.
When examining the data in IGV, the alignment quality is high in non-CpG bases for both pairs. However, there’s a frequent issue with CpG bases where: Pair one consistently registers a C (indicating a methylated base) with a high quality score. Pair two, on the other hand, frequently calls a T at the same CpG site with a much lower quality score.
This suggests there may be a readout issue with the "dark base" G, which seems underrepresented in pair two. Could this be related to a compensation issue or an incorrect background level setting for the dark base in pair two on the NovaSeq X?
Has anyone encountered similar issues with the NovaSeq X, or does anyone have advice on troubleshooting or potential adjustments?
Thanks for any insights!
Take a look at New Illumina error mode, new BBTools release (39.09) to deal with it This is a long thread and look through it completely.
Your data likely suffers from the dreaded poly-G issue that seems to specifically affect NovaSeq X. See if pre-processing your data with this tool is able to address the issue.
Thanks! This feels I have exactly opposite problem. Real Gs overcompansated and called as Ts. Would that make sense?
You may want to see how your data reacts to
polyfilter.sh
. If your data does suffer from the poly-Nucleotide issue then it is possible that aligner may have different alignments for certain reads after they are cleaned up changing the overall result you see.As discussed in other thread, while there may be a lot of NovaSeqX data out there poly-nuc call issue seems to be under-reported or under-appreciated. If your data does not have this issue then let us know.
We have exactly opposite problem. Too few Gs ;). The alignments are very good. Just where should be a G, is often T with low quality.
polyfilter.sh
can take care of other poly-nuc stretches as well (poly-T which may be unreal, usepolymers=GCAT
).Is this only happening on a single run or has this really affected every sample?
This happened on multiple runs. One run was ok (which was underloaded).
That may be an interesting clue. I wonder if the basecaller is having a problem with borderline overloaded samples which is causing the observation you see.
That sounds like something. How would I detect it? (sorry for my lack of knowledge)
You would need to involve the sequence provider. They will have to get Illumina tech support to have a look at the affected runs (this generally can be done remotely or otherwise they will need to send some diagnostic files to Illumina support). Tell them the specific issue and see if they have any pointers. You may need to be patient and persistent since this all needs time investment on part of the provider.