Question

What read lengths are produced by modern Illumina sequencers?

6

Entering edit mode

11.2 years ago

John Smith ▴ 320

I am working with a program that generates artificial FASTQ files from a given reference genome and it allows for read length customization. I am trying to produce several FASTQ files with different read lengths. Hence, I want to know what read lengths are produced by real sequencers such as those from Illumina. I found this site which states the maximum read length of some sequencers.

However, when stating, for example, that the maximum read length is 2 x 150 bp, is that supposed to mean that each end of a contig will be 150 bp long? Also, if that is the case, does this mean that the sequencer could be set up to produce read lengths equal to any number of base pairs less than 150? i.e. could the sequencer produce read lengths of 2 x 1 bp, 2 x 2 bp, 2 x 3 bp,...., 2 x 148 bp, 2 x 149 bp? Lastly, which one of those sequencers is the most commonly used for genomes which are about as long as, say, that of E. Coli?

read-length dna illumina • 45k views

ADD COMMENT • link updated 3.6 years ago by Ram 45k • written 11.2 years ago by John Smith ▴ 320

2

Entering edit mode

You can also take a look at this link. Best explanation of frequent terminologies - library (contig), insert, fragments, etc.

ADD REPLY • link updated 3.8 years ago by Ram 45k • written 11.2 years ago by poisonAlien ★ 3.2k

3

Entering edit mode

11.2 years ago

Ido Tamir 5.2k

The read lengths are mostly determined by the kits that are available for purchase for the instrument:

It is possible to sequence shorter, but this would be a waste of money, because its not possible to store the reagents.

http://www.illumina.com/systems/miseq/kits.ilmn

http://www.illumina.com/systems/nextseq-sequencer/kits.ilmn

If there is no kit available then multiple kits are used eg. 2x50 for SR100.

ADD COMMENT • link updated 3.8 years ago by Ram 45k • written 11.2 years ago by Ido Tamir 5.2k

Ram · Accepted Answer · 2014-06-19

However, when stating, for example, that the maximum read length is 2 x 150 bp, is that supposed to mean that each end of a contig will be 150 bp long?

No. Contigs are something else based on an assembled consensus sequence.

2 x 150 bp means that you have two 150 bp reads of sequence data from a single piece of DNA. The pair of reads are separated by an unspecified length based on the insert size (usually 200 to 1200 bp) which is size-selected during the sequencing prep. This picture might make things more clear.

Mapping Reads by Suspencewl - Own work. Licensed under CC0 via Wikimedia Commons.

The blown up read is 2 x 35. The insert size (bp between the sequencing adapters) is 400-500 bp. All of the reads in the picture could be used to assemble a single contig based on the consensus sequence.

For illumina sequencing, the read length is specified by the reagent kit, so you have limited flexibility there.

The MiSeq is capable of 15 Gb of sequence output. The E. coli genome is ~4.6 Mb in length so in a single run a MiSeq could easily sequence an E. coli genome at ultra-high depth.

score 10 · Accepted Answer · 2014-06-19

The read lengths (up to maximum) are somewhat configurable. However, there are some very commonly used lengths that have predominated at different times. In earlier days you would see a lot of 2x36, 2x50, and 2x76 from the Illumina GA or GAII. With the HiSeq 2000/2500 it was most common to see 2x100 or 2x150 (you also often see 2x101 or 2x151). The MiSeqs often produce 2x250 or 2x300. The HiSeq 4000, HiSeq X, and NovaSeq 5000/6000 seem to most often be run at 2x150. I would probably go with 2x100 for your simulations.