BCL2FASTQ Dual-Index Barcode Collisions - Is the i5 Index not considered when determining the number of mismatches?
1
0
Entering edit mode
3.8 years ago
DavidStreid ▴ 90

Hi,

I am trying to understand what throws a barcode collision. When a collision is detected, bcl2fastq throws the following error -

Barcodes with too few mismatches are ambiguous ( less than 2 times the number of mismatches plus 1)

Could someone clarify how the number of mismatches are summed, specifically is the sum taken from the total from i5 & i7 combined, the i5 & i7 separately, or just the i7? Below I looked a sample sheet with i7 (index) & i5 (index2) indices and want to say that the sum is taken from the i7 only because that is where the sum of mismatches seems to fit the error bcl2fastq throws.

Here are the relevant rows from the samplesheet -

index,index2,
CTTCCTTC,GAAGGAAG
CGTCTTCA,TGAAGACG
CTTCCTTC,GAAGGAAG
CGAACAAC,GTTGTTCG
GATCAGAT,AGATCTCG
TAGCTTAT,AGATCTCG

And here is what I got for the following --barcode-mismatches arguments -

--barcode-mismatches 1: No Barcode Collision

  • GATCAGAT+AGATCTCG & TAGCTTAT+AGATCTCG didn't throw a collision even though the i5's are the same

--barcode-mismatches 2: Barcode Collision

std::exception::what: Barcode collision for barcodes: CTTCCTTC+GAAGGAAG, CGTCTTCA+TGAAGACG
  • The i7's, CTTCCTTC & CGTCTTCA, have 4 mismatches.

--barcode-mismatches 3: Barcode Collision

std::exception::what: Barcode collision for barcodes: CTTCCTTC+GAAGGAAG, CGAACAAC+GTTGTTCG
  • The i7's, CTTCCTTC & CGAACAAC, have 5 mismatches.

Please let me know if I can clarify something. I don't quite have a grasp on this so I'd be happy to provide more info to get some help.

bcl2fastq barcode collision barcode-mismatches • 3.4k views
ADD COMMENT
0
Entering edit mode
3.8 years ago
GenoMax 148k

If you are not able to discriminate between two samples with just i7 index once you allow for errors then at that point there is no added value to consider i5 index. Making indexes longer with a good edit distance are essential requisites.

BTW: --barcode-mismatches acceptable values are 0,1,2.

ADD COMMENT
0
Entering edit mode

Oof I guess my third example isn't practical. But, thank you! Especially if the max allowed is 2, then I can see in the case w/ --barcode-mismatches 2, it would be poor planning to have two indices that combined are only different by 4 mismatches.

EDIT

After reviewing more runs, I think the rule is this - mismatches in the i5 are only considered if the i7 is ambiguous, however, each is considered separately. Take the example below, which didn't throw a barcode error. There are two samples in the same lane with i7 indices (index), GCGGTATT & CCGGAATT, that only have two mismatches. This seems to satisfy the rule for the error - NUM MISMATCHES < (ALLOWED_BARCODE_MISMATCH * 2) + 1. However, an error wasn't thrown. I believe this is because the i5 indices, GGTAACAA and ACCGAATG, have 7 mismatches, which is well above the threshold. This is different from the cases in the original posting where the collision errors were thrown when the i5 index was also ambiguous.

Does this make sense?

SampleSheet

index,index2
GCGGTATT,GGTAACAA
CCGGAATT,ACCGAATG

Command

bcl2fastq \
  --runfolder-dir PATH/TO/RUN \
  --sample-sheet PATH/TO/SAMPLESHEET \
  --output-dir . \
  --barcode-mismatches 1

Result

No collision error

ADD REPLY
0
Entering edit mode

There's no error here because there is no possible sequence which is one mistake away from two different barcode sequences. But there are sequences which are two mistakes away from two different barcodes.

ADD REPLY

Login before adding your answer.

Traffic: 2034 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6