Question

bcl2fastq

0

Entering edit mode

5 weeks ago

kilcdincer ▴ 10

Dear all,

I run bcl2fastq for my converting the bcl files to fastq files with the command below:

sudo bcl2fastq --runfolder-dir ./ -i Data/Intensities/BaseCalls -r 6 -p 6 -w 6 --no-lane-splitting --ignore-missing-bcls --sample-sheet SampleSheet_NovaSeq.csv

And the laneBarcode.html shows that reads were separated from lane2 while all the reads from lane1 went into undetermined.gz files. So the command I run worked for lane2 without any problem but not for lane1. So I do not think something went wrong during in silico analysis. Also, 'Top Unknown Barcodes' includes barcodes with many 'N' on the same file.

Here is the screenshot from laneBarcode.html

enter image description here

What do you think the problem might be?

Thanks in advance!

bcl2fastq bcl fastq lane • 817 views

ADD COMMENT • link updated 5 weeks ago by GenoMax 148k • written 5 weeks ago by kilcdincer ▴ 10

score 3 · Answer 1 · 2024-11-14

3

Entering edit mode

5 weeks ago

GenoMax 148k

It appears that lane 1 must have encountered some issue during sequencing. e.g. it may be marginally overloaded or there may have been a bubble in the lane at that time. You must have had a lot of missing BCL files (seeing the --ignore-missing-bcls option) and these are going to result in the N calls.Since you have more than two N's in "Unknown Barcodes" adding --barcode-mismatches 2 will not help. Percent Q30 bases and the mean quality score are also poor for this lane.

You could consult Illumina support (or your sequencing facility will need to do this) to see if there was a hardware/software problem during the run. Depending on the cause Illumina may replace the reagents (if your facility has a maintenance agreement) so you may be able to get your sample resequenced at no cost. Otherwise you will need to:

a. Drop data from lane 1 (it is likely going to have N's elsewhere in main reads, remove --no-lane-splitting ) Have you done a fastqc check? Only if there are no N's in main reads you could use other programs like deML to see if you are able to demultiplex data from lane 1.
b. Resequence the sample on a new flowcell, if you must reach a certain yield.

ADD COMMENT • link 5 weeks ago by GenoMax 148k

0

Entering edit mode

Thank you very much for your answer!

We have contacted to sequencing facility. And they provided the report below saying the problem might happen during in silico analysis:

enter image description here

I have checked fastqc output, and main reads have also Ns in undetermined1/2.gz.

ADD REPLY • link 5 weeks ago by kilcdincer ▴ 10

1

Entering edit mode

I have checked fastqc output, and main reads have also Ns

If that is the case there was a problem with lane 1. Does it extend to lane 2 also? Difficult to discern a general issue from plots above but it looks like a few cycles during the index sequencing were affected. Resolution here may depend on the relationship you have with the sequencing facility. Ideally this pool should be resequenced.

ADD REPLY • link 5 weeks ago by GenoMax 148k

0

Entering edit mode

There are no Ns in lane 2. This is supposed to be shallow WES sequencing. With the info below given to me:

20040719 NovaSeq 6000 SP Reagent Kit v1.5 (200 cycles)

NovaSeq6000 SP [88.8M reads / sample --> ~1.8X]

ADD REPLY • link 5 weeks ago by kilcdincer ▴ 10

score 1 · Answer 2 · 2024-11-14

1

Entering edit mode

5 weeks ago

swbarnes2 14k

You have only 8 samples? You might be able to demultiplex with index1 alone.

Regardless of what their pretty pictures show, the demultiplexing report clearly shows that index 2 is a mess in lane 1. And there's nothing wrong with your in silico analysis when bcl2fastq works fine one lane.

ADD COMMENT • link 5 weeks ago by swbarnes2 14k

0

Entering edit mode

While that can be done if there are N's in read 1 there is no point in using that data.

ADD REPLY • link 5 weeks ago by GenoMax 148k

0

Entering edit mode

A single N can be tolerated in index 1. A lot of those Lane 1 reads have just the one N in index 1, but otherwise match what they should. Those reads will demultiplex properly if one mismatch is allowed. Looks like you have enough diversity to trim off the last base of index 1 altogether, and it should still work.

ADD REPLY • link 5 weeks ago by swbarnes2 14k

0

Entering edit mode

Looking at the Q scores and % bases >Q30 for the "undetermined" line in first screenshot, there is a good chance that there are N's in the main read 1. That is what I was referring to.