Question

Problem with demultiplexing illumina dual indexed libraries

1

Entering edit mode

6.4 years ago

m.sadman.sakib ▴ 120

Hello,

I am currently facing a problem regarding demultiplexing my dual indexed reads run on Hiseq 2000. Basically, the Lane summary looks like this: Whole flow cell summary

So, each lane, having around 230 million reads. Sequencing went fine. I have put 3 samples per lane, to have around 70 million reads per sample. But I get almost half of that. I noticed, the undetermined reads are quite high. For example, Undetermined read statistics

Above is the statistics of undetermined reads. They are covering almost 30,40 sometimes 60% of the flow cell. Basically, we did not allow any mistmatch while running the bcl2fastq to generate fastq files. But in those undetermined reads, I have many cases where the i7 index is fine, but the i5 index has one or two mismatches. An example is below: Undetermined reads each lane with barcode sequences

I used illumina CD indices for making the libraries(a.k.a. HT index, i7: D701,702,703...i5: D501,502,503 etc).Basically, in this case, I do not care about the i5 indices as they are the same for all 3 samples per lane. For example, Sample 1: D701-D502, Sample 2: D702-D502, Sample 3: D703-D502. This is how I ran lane 1. Similar also to Lane 2, 3 and so on... Therefore, my question is the following:

Is it possible to run bcl2fastq that will demultiplex bcl file based on only index 1(i7), although the sample were prepared with dual indexing? Or is there any better way to do demultiplexing in this situation to get more reads?

I would really appreciate if anyone can help.

RNA-Seq next-gen sequencing illumina sequence • 6.7k views

ADD COMMENT • link updated 6.4 years ago by Gabriel R. ★ 2.9k • written 6.4 years ago by m.sadman.sakib ▴ 120

0

Entering edit mode

This is an excellent example of why one should never use the same index for all samples (either in first or second location).

ADD REPLY • link 6.4 years ago by GenoMax 153k

score 3 · Answer 1 · 2019-04-03

Just omit index 2 from your sample sheet, and redo the demultiplexing.

The software that calls bases from clusters flips out when the entire flowcell lights up for a single base. That leads to N's in that index, and the demultiplexing software is bound and determined to use that awful index2 if you tell it to do so, and then the read fails demutiplexing because of the N's.

So just drop that from the sample sheet. bcl2fastq will not mind.

Basically, we did not allow any mistmatch while running the bcl2fastq to generate fastq files.

That's probably too stringent. Most index sets have been designed to be robust to a single error. Take advantage of that, let bcl2fastq run with the default setting of one mismatch. The software will tell you if the indices you have won't support that much flexibility.

score 0 · Answer 2 · 2019-04-03

0

Entering edit mode

6.4 years ago

Gabriel R. ★ 2.9k

I am biased but I would recommend my own deML. It is a maximum-likelihood demultiplexer which is designed to deal with uncertainty and partial information. I still maintain it, let me know if you have any issues.

ADD COMMENT • link 6.4 years ago by Gabriel R. ★ 2.9k