Hi all,
I am trying to demultiplex a Smart-Seq2 library in order to obtain one FASTQ file per well using bcl2fastq
v2.20. Wells were dual-indexed with an i7 and i5 index.
This is the command I am using:
bcl2fastq -R 210304_NS500188_0399_AHNLT5AFX2 \
--sample-sheet 210304_NS500188_0399_AHNLT5AFX2/SampleSheet.csv \
--barcode-mismatches 0 --no-lane-splitting -o fastqs --use-bases-mask Y76,I8,I8,Y76
Here the head of my SampleSheet.csv:
[Header]
FileFormatVersion,2
RunName,210304_NS500188_0399_AHNLT5AFX2
InstrumentType,NextSeq2000
[Reads]
Read1Cycles,76
Read2Cycles,76
Index1Cycles,8
Index2Cycles,8
[Data]
Sample_ID,Sample_Name,Sample_Project,I7_Index_ID,index,I5_Index_ID,index2
A1,A1,SmartSeq2,701,TAAGGCGA,501,TAGATCGC
A2,A2,SmartSeq2,702,CGTACTAG,501,TAGATCGC
A3,A3,SmartSeq2,703,AGGCAGAA,501,TAGATCGC
...
This runs without throwing any errors, however, it only produces Unassigned_S0_R[12]_001.fastq.gz
files, but no demultiplexed FASTQs.
Looking at the "Unassigned" FASTQs, I notice that the second index has consistently N's in the beginning throughout the file:
@NS500188:399:HNLT5AFX2:1:11101:1879:1068 1:N:0:TGCTGGGT+NNGATCTA
CTAATNAAGTGTGAGATCTTTGACCTCAAGATCCTTTGAGAATTCCTGCTTTTTCTGCAGCACATATTTGTGTCAT
+
6AAAA#/AEEEEEEEEEEEEEEAAAEAE6AAEE/EEE/EEE/EE6/EEEEEEEEEEEEE/EA<AEAEEEEEEE/EE
@NS500188:399:HNLT5AFX2:1:11101:7780:1068 1:N:0:GTGTGGTG+NNTGCAGT
ACCAANTGCTGGGATTACAGGTGCCCACCACCACACTCAGCTACTTTTCTGTAGAGACAAGGTTTCGCCATGTTGC
+
AAAAA#EEAEEEEEEEEEEEEEEAE//EEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE
@NS500188:399:HNLT5AFX2:1:11101:4719:1069 1:N:0:GGGGGGGG+NNATCTCG
GCAATNGCCGCCAGTTAAATAGCTTGCAAAATACGTGGCCTTATGGTTACAGTATGCCCATCGCAGTTCGCTACAC
Thus, it is not suprising that these reads cannot be assigned with the index.
Can anybody tell me how these N's might have been introduced and how to solve this issue? Thank you in advance!
Are there 2 N's in second index at that position in entire dataset? If your index set supports it you could try
--barcode-mismatches 2
and see if they help. If i5 indexes are identical for all samples then you could simply use the first index to demultiplex.Yes, the 2 N's are always at that position in the second index in the entire dataset. Unfortunately, I cannot set
--barcode-mismatches 2
due to the similarity within the first index. I use four different i5 indexes in total. Is there a way to set--barcode-mismatches 2
only for i5? Or can I possibly trim the i5 index to 6 bases instead of 8 somehow?