Question

Salvage barcode undetermined reads from Illumina HiSeq 2500 2 x 100 bp pair-ended runs after demultiplexing

0

Entering edit mode

9.2 years ago

Louis Kok ▴ 30

Hi All, I have too many undetermined reads generated from the HiSeq run which cannot be assigned to any sample due to barcode issue. Samples were multiplexed with dual barcodes (8bp indexes x 2 = 16bp indexes). I used bcl2fastq-1.8.4 script to demultiplex with max. one base mismatch allowed.

After demultiplexing, I found that there are too many undetermined reads. Further checking the reads' barcode, I found that they are having two or more base mismatches to the list of indexes which were used to multiplex. Has anyone tried to salvage the undetermined reads, perhaps by allowing more mismatches? If yes, how many mismatches should be allowed while the keeping the outcome accurate?

Kindly share with me your experience. Thanks a lot.

demultiplexing Illumina undetermined barcode • 7.5k views

ADD COMMENT • link updated 2.7 years ago by Ram 45k • written 9.2 years ago by Louis Kok ▴ 30

0

Entering edit mode

FYI, whenever we've had this happen the samples/run had other issues and it ended up not being worthwhile salvaging the data.

ADD REPLY • link 9.2 years ago by Devon Ryan 105k

Ram · Answer 1 · 2016-01-27

We usually use the minimum pairwise hamming distance between all barcodes as a guide to set the number of allowed mismatches, e.g. if the minimum hamming distance is 3 we allow at most 1 mismatch, if it is 5 we allow at most 2, and so on. In general:

#mismatches = floor( (min(hamming(i,j)) - 1) / 2) for all barcodes i and j (i != j)

But I agree in most cases 1 mismatch works fine.

Ram · Answer 2 · 2016-01-27

1

Entering edit mode

9.2 years ago

dariober 15k

bcl2fastq has a --barcode-mismatches option which is "number of allowed mismatches per index", just re-run it with --barcode-mismatches 2 or 3. In my experience the default of 1 works well in most of cases. However, I would make sure that such error rate is not due to problems with the run or with the sample labelling.

ADD COMMENT • link updated 5.3 years ago by Ram 45k • written 9.2 years ago by dariober 15k

0

Entering edit mode

That is likely to not work, and here's why. (At least with my version of the pipeline, I'd be happy to hear this has been fixed)

When you set it to mismatch of 1, it takes each barcode, and generates all the possible off-by-one barcodes, so when it sees those, it knows what barcode it's really supposed to be. So for AAAAAAAA, it decides that AAAAAAAT is one of those one-offs. If you also have AAAAAATT in the same lane, one of its one-off barcodes is AAAAAAAT. Rather than smartly say "Well, if we see that exact barcode sequence, we'll just skip it, because we don't know what its supposed to be" the software will refuse to process the lane. So when using mismatch-1, your barcodes have to differ from each other by at least 3 letters. If you up the mismatch allowance, there are likely to be barcode clashes that you didn't worry about at mismatch 1.

ADD REPLY • link updated 5.3 years ago by Ram 45k • written 9.2 years ago by swbarnes2 14k

0

Entering edit mode

Can you really blame the software when you don't follow Illumina's recommendations for what are compatible barcodes?

ADD REPLY • link 9.2 years ago by Devon Ryan 105k