Question

Demultiplexing of Smart-Seq2 library using bcl2fastq

0

Entering edit mode

3.7 years ago

gnomee ▴ 50

Hi all,

I am trying to demultiplex a Smart-Seq2 library in order to obtain one FASTQ file per well using bcl2fastq v2.20. Wells were dual-indexed with an i7 and i5 index.

This is the command I am using:

bcl2fastq -R 210304_NS500188_0399_AHNLT5AFX2 \
--sample-sheet 210304_NS500188_0399_AHNLT5AFX2/SampleSheet.csv \
--barcode-mismatches 0 --no-lane-splitting -o fastqs --use-bases-mask Y76,I8,I8,Y76

Here the head of my SampleSheet.csv:

[Header]
FileFormatVersion,2
RunName,210304_NS500188_0399_AHNLT5AFX2
InstrumentType,NextSeq2000
[Reads]
Read1Cycles,76
Read2Cycles,76
Index1Cycles,8
Index2Cycles,8
[Data]
Sample_ID,Sample_Name,Sample_Project,I7_Index_ID,index,I5_Index_ID,index2
A1,A1,SmartSeq2,701,TAAGGCGA,501,TAGATCGC
A2,A2,SmartSeq2,702,CGTACTAG,501,TAGATCGC
A3,A3,SmartSeq2,703,AGGCAGAA,501,TAGATCGC
...

This runs without throwing any errors, however, it only produces Unassigned_S0_R[12]_001.fastq.gz files, but no demultiplexed FASTQs. Looking at the "Unassigned" FASTQs, I notice that the second index has consistently N's in the beginning throughout the file:

@NS500188:399:HNLT5AFX2:1:11101:1879:1068 1:N:0:TGCTGGGT+NNGATCTA
CTAATNAAGTGTGAGATCTTTGACCTCAAGATCCTTTGAGAATTCCTGCTTTTTCTGCAGCACATATTTGTGTCAT
+
6AAAA#/AEEEEEEEEEEEEEEAAAEAE6AAEE/EEE/EEE/EE6/EEEEEEEEEEEEE/EA<AEAEEEEEEE/EE
@NS500188:399:HNLT5AFX2:1:11101:7780:1068 1:N:0:GTGTGGTG+NNTGCAGT
ACCAANTGCTGGGATTACAGGTGCCCACCACCACACTCAGCTACTTTTCTGTAGAGACAAGGTTTCGCCATGTTGC
+
AAAAA#EEAEEEEEEEEEEEEEEAE//EEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE
@NS500188:399:HNLT5AFX2:1:11101:4719:1069 1:N:0:GGGGGGGG+NNATCTCG
GCAATNGCCGCCAGTTAAATAGCTTGCAAAATACGTGGCCTTATGGTTACAGTATGCCCATCGCAGTTCGCTACAC

Thus, it is not suprising that these reads cannot be assigned with the index.

Can anybody tell me how these N's might have been introduced and how to solve this issue? Thank you in advance!

bcl2fastq smart-seq2 sequencing • 1.6k views

ADD COMMENT • link 3.7 years ago by gnomee ▴ 50

1

Entering edit mode

Are there 2 N's in second index at that position in entire dataset? If your index set supports it you could try --barcode-mismatches 2 and see if they help. If i5 indexes are identical for all samples then you could simply use the first index to demultiplex.

ADD REPLY • link 3.7 years ago by GenoMax 147k

0

Entering edit mode

Yes, the 2 N's are always at that position in the second index in the entire dataset. Unfortunately, I cannot set --barcode-mismatches 2 due to the similarity within the first index. I use four different i5 indexes in total. Is there a way to set --barcode-mismatches 2 only for i5? Or can I possibly trim the i5 index to 6 bases instead of 8 somehow?

ADD REPLY • link 3.7 years ago by gnomee ▴ 50

score 2 · Accepted Answer · 2021-03-08

2

Entering edit mode

3.7 years ago

GenoMax 147k

You can't set barcode mismatches only for i5. Can you try this base mask --use-bases-mask Y76,I8,nnI6,Y76? This could be a problem if the i5 indexes minus the first 2 bases are not unique.

This sounds like a loading or instrument issue. Is it possible to consult with Illumina or your sequencing provider to see if they can check for overloading or other issues?

ADD COMMENT • link 3.7 years ago by GenoMax 147k

0

Entering edit mode

With this and adjusting the sample sheet accordingly, I indeed get properly demultiplexed FASTQs. Fortunately, there is no clash in the i5 indexes when removing the first two bases. I will definitely check with the facility though what the issue here could have been. Thank you for your help!

ADD REPLY • link 3.7 years ago by gnomee ▴ 50