print(matched_barcodes)
Unknown_Barcode Match
1 AAGACACT+TGCTGTCA No Match
2 GTCCACAG+TATTCGCG No Match
3 GGGGGGGG+AGATCTCG No Match
4 TCTGCAAG+AAGGTGAA No Match
5 ATTATGTT+GAATACAG No Match
6 AGTAAGCG+GCGAATGA No Match
7 CACATCCT+ATGGTATT No Match
8 ACACGATC+AGTGCAGC No Match
9 AGCATGGA+GTAGCGCT No Match
10 CAGCAAGG+GGTAGAGG No Match
So these are the top unknown barcodes which I tried to find in the samplesheet which was used to generate fastq. As most of them are going into the undertmined fastq. So the next thing I tried to look for partial match across the samplesheet
I get this
Sample index index2 Unknown_Barcode Match_Type
Samp_B CACATCCT AGTGCAGC CACATCCT+ATGGTATT index1
Samp_C ACACGATC ATGGTATT CACATCCT+ATGGTATT index2
Samp_D ACACGATC ATGGTATT ACACGATC+AGTGCAGC index1
Samp_E CACATCCT AGTGCAGC ACACGATC+AGTGCAGC index2
Samp_F AGCATGGA TTAGCGCT AGCATGGA+GTAGCGCT index1
Samp_G CAGCAAGG GTTAGAGG CAGCAAGG+GGTAGAGG index1
--barcode-mismatch 0 was used.
So what is way to figure out what went wrong with bcl2fastq .
What are the troubleshooting steps I should follow to find samplesheet or what parameter should be used in bcl2fastq to reduce the undermined fastq.
will give it a try and update it
This is what i see the when I run the demuxbyname.sh
That is very odd. If the files came out from the
bcl2fastq
run then that should not have happened. Is it possible that your originalbcl2fastq
run did not completely finish. You could tryrepair.sh
to fix the sync but see how many singletons you end up with.I would suggest that you first try to rectify the samplesheet based on the indexes you see in
Undetermined
file with theawk
code I had linked. Then repeat the demultiplexing withbcl2fastq
.demuxbyname.sh
should be the second option, if you are not able to getbcl2fastq
to demultiplex the data properly.If you made the SampleSheet file up manually then you may want to use
Illumina experiment manager
software (Windows only) to create one, especially if this is something you don't do regularly. As swbarnes2 noted your index combinations may be shifted by one row or something along that line.'You could try repair.sh to fix the sync but see how many singletons you end up with." this I did try as well and it same issue there. I would go ahead with your suggestion and for the IEM and try.
"As swbarnes2 noted your index combinations may be shifted by one row or something along that line." Can you explain this you mean this might be a manual error while making the samplehseet?
Once you run the
awk
code on yourundetermined_R1
file show us what you get withAlso show us 10 top rows of resulting barcodes and the read numbers.
You have not said what kind of sequencer this run is from but most of these index combinations are likely not usable with only a few reads assigned. You will want to find index combinations (sort this output) and then see if you can correlate the top indexes to the samplesheet you were provided. You may need to rev-comp one of the indexes (that is likely the easiest error) or you may need to swap i7-i5 columns (another common error) to match the samples to indexes seen.
If neither of these cure the problem then you will have to send the index combinations that look real (have significant number of reads) to the people in lab so they can figure out what went wrong. What you are handing them is what the sequencer saw (truth), irrespective of what they think should be there (assuming there were no issues with sequencer run).
Once the corrections are made, it should then just be a matter of creating a new samplesheet and re-demutilplexing the data.