Can anyone explain me in detail about next generation sequence analysis and third generation sequence analysis?
Can anyone explain me in detail about next generation sequence analysis and third generation sequence analysis?
I recommend you go and read these blog posts by Luke Jostins:
Or this open-access review:
A window into third-generation sequencing (PDF)
Briefly, "generation" refers to the chemistry and technology used by the sequencing process. First generation generally refers to Sanger sequencing. "Next-generation", when you think about it, is a meaningless term, but is generally used to refer to any of the high-throughput methods which were developed after Sanger (e.g. 454, Illumina). And no-one quite knows what third-generation means, but some people use the term to refer to single-molecule methods.
I think the important thing is to understand the underlying process behind each sequencing procedure and not worry too much about silly jargon and buzz phrases.
There is no clear definition of 2nd and 3rd generation sequencing, but there are two common views:
The data analysis will depend more on technology than on what generation the technology is classified as. 2nd and 3rd generation are both high-throughput. Different technologies have different error rates and patterns. For example:
Most of the sequence analysis since 2006 has been on dealing with very short reads. If you recall Solexa started out with <30bp reads, not that much longer than MPSS. After adapter trimming only about 20-25bp of usable sequence was left. The BLAT website would not even accept such short reads.
Although SBS read lengths have been growing ever since, the aligners and assemblers that were developed to deal with short reads still tended to be highly indexed and very stringent. Bowtie and MAQ did not accept indels. Period. Think about that.
The verbose alignment formats that worked for BLAST now seem like flowery Victorian love letters.
With longer reads the tide is shifting back to being more sensitive with regard to detecting and dealing with larger indels. Aligning and assembling single molecule reads will be a more bayesian and error-prone affair and will require more biological knowledge be translated into the software. There is also weird stuff like phased reads from PacBio which will require some retooling. And direct detection of methylation, if that pans out.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.