And the laneBarcode.html shows that reads were separated from lane2 while all the reads from lane1 went into undetermined.gz files. So the command I run worked for lane2 without any problem but not for lane1. So I do not think something went wrong during in silico analysis. Also, 'Top Unknown Barcodes' includes barcodes with many 'N' on the same file.
It appears that lane 1 must have encountered some issue during sequencing. e.g. it may be marginally overloaded or there may have been a bubble in the lane at that time. You must have had a lot of missing BCL files (seeing the --ignore-missing-bcls option) and these are going to result in the N calls.Since you have more than two N's in "Unknown Barcodes" adding --barcode-mismatches 2 will not help. Percent Q30 bases and the mean quality score are also poor for this lane.
You could consult Illumina support (or your sequencing facility will need to do this) to see if there was a hardware/software problem during the run. Depending on the cause Illumina may replace the reagents (if your facility has a maintenance agreement) so you may be able to get your sample resequenced at no cost. Otherwise you will need to:
a. Drop data from lane 1 (it is likely going to have N's elsewhere in main reads, remove --no-lane-splitting ) Have you done a fastqc check? Only if there are no N's in main reads you could use other programs like deML to see if you are able to demultiplex data from lane 1.
b. Resequence the sample on a new flowcell, if you must reach a certain yield.
I have checked fastqc output, and main reads have also Ns
If that is the case there was a problem with lane 1. Does it extend to lane 2 also? Difficult to discern a general issue from plots above but it looks like a few cycles during the index sequencing were affected. Resolution here may depend on the relationship you have with the sequencing facility. Ideally this pool should be resequenced.
You have only 8 samples? You might be able to demultiplex with index1 alone.
Regardless of what their pretty pictures show, the demultiplexing report clearly shows that index 2 is a mess in lane 1. And there's nothing wrong with your in silico analysis when bcl2fastq works fine one lane.
A single N can be tolerated in index 1. A lot of those Lane 1 reads have just the one N in index 1, but otherwise match what they should. Those reads will demultiplex properly if one mismatch is allowed. Looks like you have enough diversity to trim off the last base of index 1 altogether, and it should still work.
Looking at the Q scores and % bases >Q30 for the "undetermined" line in first screenshot, there is a good chance that there are N's in the main read 1. That is what I was referring to.
Thank you very much for your answer!
We have contacted to sequencing facility. And they provided the report below saying the problem might happen during in silico analysis:
I have checked fastqc output, and main reads have also Ns in undetermined1/2.gz.
If that is the case there was a problem with lane 1. Does it extend to lane 2 also? Difficult to discern a general issue from plots above but it looks like a few cycles during the index sequencing were affected. Resolution here may depend on the relationship you have with the sequencing facility. Ideally this pool should be resequenced.
There are no Ns in lane 2. This is supposed to be shallow WES sequencing. With the info below given to me:
20040719 NovaSeq 6000 SP Reagent Kit v1.5 (200 cycles)
NovaSeq6000 SP [88.8M reads / sample --> ~1.8X]