Number of passing clusters vs. number of read pairs vs. total number of reads
1
0
Entering edit mode
5.9 years ago
AP ▴ 100

Hi all,

I apologize for a rather basic question but I am confused about the terminology.

What is really the difference between:

  • Number of passing filters
  • Number of clusters
  • Number of read pairs per lane
  • Total number of reads

For instance, Hiseq 4000 should produce about 300M reads per lane. What does that mean exactly? If I sequence at PE150, does that mean the total number of expected reads should 600M? This is quite important when budgeting a project. 75,000 fragments with a 20X coverage would require 1,500,000 reads and so 0.0025 lanes of Hiseq 4000 (1,500,000/600M)?

Any help clarifying this would be highly appreciated!

illumina flowcell reads • 6.0k views
ADD COMMENT
1
Entering edit mode
5.9 years ago
GenoMax 147k
  • Number of clusters (library fragments anchored to flowcell capable of producing sequence). This number is fixed for patterned flowcells but variable for other flowcells. Library quality dependent.
  • Number of clusters passing chastity filter (initial Illumina data processing filter norms e.g. pure sequence, certain quality)

Chastity is defined as the ratio of the brightest base intensity divided by the sum of the brightest and second brightest base intensities. Clusters “pass filter” if no more than 1 base call has a chastity value below 0.6 in the first 25 cycles.

  • Number of read pairs per lane = Number of clusters passing filter in that lane (x2, if counting actual reads)

Illumina double counts reads in general so number of reads usually means only 1/2 unique library fragments.

ADD COMMENT
0
Entering edit mode

OK thank you very much for the clarification! So, does that mean I should consider 300M reads when calculating the number of lanes required for e.g. a 20X coverage (like in the example above?); Or should I double the number of reads?

ADD REPLY
0
Entering edit mode
ADD REPLY
0
Entering edit mode

Thanks but I don't find it very helpful and clear. I like being able to calculate this by hand myself.

ADD REPLY
0
Entering edit mode

Using published specification for HiSeq 3000/4000 :

2,500,000,000 single-end reads per 8 lanes = 312,500,000 reads per lane OR
5,000,000,000 paired-end reads per 8 lanes = 625,000,000 reads per lane

625,000,000 x 150 = 9.375000e10 total bases per lane for paired-end reads.

What is the average length of your 75,000 fragments going to be? You would be sampling the sequence between the two ends.

ADD REPLY

Login before adding your answer.

Traffic: 2582 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6