Hi all,
I have data from a single-end run with a dual index structure generated by the NextSeq 500 instrument. I want to do the demultiplexing by bcl2fastq
tool. How should my SampleSheet.csv
structure and bcl2fastq
script look like?
I used aSampleSheet
structure (shared below) with a bcl2fastq
command (mentioned below) and I got only 18% of reads undetermined
which I think is a good ratio (total number of reads was around 76 million and ~14 million reads were undetermined).
The SampleSheet.csv
structure. The index
column is for index i7 and the index2
column is for index i5.
[Header],,,
[Reads],,,
[Settings],,,
adapter,,,
,,,
[Data],,,
Sample_ID,Sample_Name,Description,index,index2
1_mESCs,1_mESCs,,AACCGCGG,CTAGCGCT
2_mESCs,2_mESCs,,GGTTATAA,CTAGCGCT
The bcl2fastq
script is as below.
bcl2fastq --runfolder-dir --output-dir --sample-sheet --barcode-mismatches 0
Is the overall approach correct? Should I include the --use-bases-mask
option as well?
Thank you for your input.
Thank you for your input. How can I check if length of the index cycles matches length of my indexes. I am not sure about it and do not want to encounter any mistake.
Is it not acceptable to use the bcl2fastq with the default options and let the tool identify the indices based on the samplesheet?
You must know that this flowcelll was run as say 50x8x8x50. Length of the index cycles here 8 matches your indexes in the samplesheet. You can see the number of cycles in RunInfo.xml file (in the data folder you have).
That is how bcl2fastq works. You seem to have changed that one option to get perfect matches on index reads otherwise all else is default.
Thank you for your input it was helpful.
Dear GenoMax,
This is the header of my
`RunInfo.xml
file:The length of Indexes is 8 in the experiment and based on what you mentioned, I think I do not need to use the
bases-mask
option right?.Correct. This is a single-end dual indexed (8 bp each) run.
Thank you.