Question

What Does The Community Think Of "Sequence-Specific Error Profile Of Illumina Sequencers"

12

Entering edit mode

13.9 years ago

Nick Loman ▴ 610

I just finished reading "Sequence-specific error profile of Illumina sequencers" and I found it extremely interesting. It suggests that Illumina data suffers from quite serious systematic errors which may even affect SNP calls. I believe this will come as a surprise to the sequencing community. We are comfortable with 454 data having systematic homopolymeric tract problems, and with Illumina having high GC% issues, but I don't believe this particular issue has been described before.

I need to read it a few more times before I summarise my reaction, I will reply to my own question when I have.

http://nar.oxfordjournals.org/content/early/2011/05/14/nar.gkr344.full

What do you think about this paper? Do you believe there truly are such systemic issues with Illumina data? Is there any other explanation for the observed results?

illumina error next-gen sequencing • 9.2k views

ADD COMMENT • link updated 9.2 years ago by Sunguk • 0 • written 13.9 years ago by Nick Loman ▴ 610

0

Entering edit mode

Good topic. Community wiki?

ADD REPLY • link 13.9 years ago by David Quigley 11k

0

Entering edit mode

Yes. See also related post.

ADD REPLY • link updated 5.5 years ago by Ram 45k • written 13.9 years ago by Casey Bergman 18k

score 7 · Answer 1 · 2011-05-19

I was recently at the Cold Spring Harbor Biology of Genomes conference. This topic was presented by Meromit Singer (UC Berkeley). She talked about how this seemed to be unique to Illumina reads. Coincidentally, in the audience was a representative of Illumina who responded that the company knows about this, other reads from other systems could see something similar, and importantly, Illumina will have a fix in the chemistry behind this problem in a short time.

So, yes, after hearing the talk, I do believe that there are such systematic errors in Illumina data. I feel they are small, based on the CSHL talk I heard. No concrete alternative explanation was given.

(I don't have such data to analyze first-hand and so my opinion comes from what I observed at CSHL last week.)

score 5 · Answer 2 · 2011-05-19

5

Entering edit mode

13.9 years ago

Nick Loman ▴ 610

Bastien Chevreux just pointed me to the following resource!

http://chevreux.org/GGCxG_problem.html

So it seems at least this part of it is not new ...

ADD COMMENT • link 13.9 years ago by Nick Loman ▴ 610

score 3 · Answer 3 · 2011-05-20

Illumina is pushing an update of their chemistry that could improve these issues. Eliott Margulies (NHGRI) had the chance to analyse the results using this update for human samples, and the results he presented at the Genomics of Rare Diseases meeting in Hinxton looked much better. The update basically gets rid of the GC-bias problems at the level of coverage when mapping, which I am guessing has to do with the fact that there are fewer GGCxG issues in there, at least partially.

score 3 · Answer 4 · 2011-05-20

Nice paper and it was about time that this appeared in a respected journal.

However, I wonder whether the use of Google or other search engines has fallen into discrace with both authors and reviewers. Try searching any of the following terms on Google, Bing, Yahoo, ...

"solexa ggc" or "illumina ggc" or "illumina ggc motif" (all of these even without quotes)

If it's not the top hit itself, then it's in the top 5: a link to either a discussion on GGC or GGCxG motif on the SeqAnswers board in 2009 or, even better, a direct link to the chapter on Illumina sequence assembly in the MIRA documentation on SourceForge (see here) which talks about exactly these issues (complete with screenshots on assemblies affected by this).

And that's been documented since 2009/2010 and MIRA has parameters turning on routines which minimise the impact of these things on SNP calling.

Now, if someone publishes a paper on how the data from Illumina between Q3 2009 and the advent of TrueSeq kits showed a strong bias in coverage which is dependent on GC content again without even acknowledging the MIRA documentation, I'll start to weep in the corner.

score 1 · Answer 5 · 2011-05-19

Actually, I'm mildly suspicious of all of them. I remember a talk at ASHG a couple of years back that included an analysis of several platforms, and the concordance among them was much less than we would like to see. Can't find my notes on that right now. Will keep looking.

So when I'm using any of the data I like to see multiple occurrences of a SNP via different projects, for example. I'm not suggesting I'd dismiss novel SNPs out of hand. But I wouldn't bet the rest of my career on one without double-checking.

The technology will continue to get better and more reliable--and certainly has since I heard that talk. But some of the early stuff I'm particularly wary of.

Ram · Answer 6 · 2016-01-27

In my study, the results were slightly different.