Question

Number of passing clusters vs. number of read pairs vs. total number of reads

0

Entering edit mode

6.4 years ago

AP ▴ 100

Hi all,

I apologize for a rather basic question but I am confused about the terminology.

What is really the difference between:

Number of passing filters
Number of clusters
Number of read pairs per lane
Total number of reads

For instance, Hiseq 4000 should produce about 300M reads per lane. What does that mean exactly? If I sequence at PE150, does that mean the total number of expected reads should 600M? This is quite important when budgeting a project. 75,000 fragments with a 20X coverage would require 1,500,000 reads and so 0.0025 lanes of Hiseq 4000 (1,500,000/600M)?

Any help clarifying this would be highly appreciated!

illumina flowcell reads • 6.4k views

ADD COMMENT • link updated 6.4 years ago by GenoMax 151k • written 6.4 years ago by AP ▴ 100

score 1 · Answer 1 · 2019-01-17

1

Entering edit mode

6.4 years ago

GenoMax 151k

Number of clusters (library fragments anchored to flowcell capable of producing sequence). This number is fixed for patterned flowcells but variable for other flowcells. Library quality dependent.
Number of clusters passing chastity filter (initial Illumina data processing filter norms e.g. pure sequence, certain quality)

Chastity is defined as the ratio of the brightest base intensity divided by the sum of the brightest and second brightest base intensities. Clusters “pass filter” if no more than 1 base call has a chastity value below 0.6 in the first 25 cycles.

Number of read pairs per lane = Number of clusters passing filter in that lane (x2, if counting actual reads)

Illumina double counts reads in general so number of reads usually means only 1/2 unique library fragments.

ADD COMMENT • link 6.4 years ago by GenoMax 151k

0

Entering edit mode

OK thank you very much for the clarification! So, does that mean I should consider 300M reads when calculating the number of lanes required for e.g. a 20X coverage (like in the example above?); Or should I double the number of reads?

ADD REPLY • link 6.4 years ago by AP ▴ 100

0

Entering edit mode

Use Illumina sequencing coverage calculator :-)

ADD REPLY • link 6.4 years ago by GenoMax 151k

0

Entering edit mode

Thanks but I don't find it very helpful and clear. I like being able to calculate this by hand myself.

ADD REPLY • link 6.4 years ago by AP ▴ 100

0

Entering edit mode

Using published specification for HiSeq 3000/4000 :

2,500,000,000 single-end reads per 8 lanes = 312,500,000 reads per lane OR
5,000,000,000 paired-end reads per 8 lanes = 625,000,000 reads per lane

625,000,000 x 150 = 9.375000e10 total bases per lane for paired-end reads.

What is the average length of your 75,000 fragments going to be? You would be sampling the sequence between the two ends.

ADD REPLY • link 6.4 years ago by GenoMax 151k