Hi,
For Illumina sequencing data with dual indexes (151 read1 + 8 index1 + 8 index2 + 151 read2), conventional demultiplexing method is to set both index1 and index2 for each sample.
However, for some data (i.e. UMI in index2), only index1 is fixed, and index2 is random
. So there is no way to set both index1 and index2 in the sample sheet.
For such case, is it applicable to set only index1 to demultiplex data? Seems bcl2fastq doesn't support such settings. Does any have any experience?
Yes, different samples with different index 1 can have same random index 2.
Currently I demultiplex all data to
Undetermined
, and split the FASTQ file by its index 1. But it's time consuming.I may try to alter bcl2fastq source code to support index 1 based demultiplexing for dual index data.
How many random indexes are expected in
index 2
generally (tens, hundreads or more)? Doing #1 in my comment above may be faster, if theindex 2
size is manageable.thousands or even more
I think doing #1 is probably going to be the fastest option. One can easily collect index combinations from the resulting files from round 1 of demultiplexing. Since you work with NovaSeq the data files must be huge.
Biologically speaking, how are you even getting the UMI in index read 2?
Maybe it is something like this:
Incorporation of unique molecular identifiers in TruSeq adapters improves the accuracy of quantitative sequencing.
Yes, with customized primers
Ah, that'll definitely break Illumina's software.