Hi,
I'm trying to generate demultiplxed fastq files from my HiSeq4000 run.
I ran 3 paired-end samples in two lanes, indexed by hexamer sequences on both reads. In each lane I spiked PhiX sequences to enrich the diversity.
Specifically, my fragments look like this:
[6bp-index]-[transcript]-[6bp-index]
The transcript parts of the fragments are near the 3' end so my reads are expected to look like this:
read1 - ran for 110 cycles:
[6bp-index]-[104bp transcript]
read2 ran for 55 cycles:
[6bp-index]-[46bp barcodes]-[3-bp polyA]
The Runinfo.xml file in the run folder says each read is 150 bp, the left index is 14 bp, and the right one is 8 bp:
Read Number="1" NumCycles="150" IsIndexedReads="N"
Read Number="2" NumCycles="14" IsIndexedReads="Y"
Read Number="3" NumCycles="8" IsIndexedReads="Y"
Read Number="4" NumCycles="150" IsIndexedReads="N"
I tried several combinations of SampleSheet and --use-bases-mask
argument for the bcl2fastq parameters, such as:
Under [Reads] in the SampleSheet only defining read lengths of 150 seems to work:
150
150
And under Data header in the SampleSheet file I define: Lane,Sample_ID,Sample_Name,Sample_Plate,Sample_Well,I7_Index_ID,index,index2,Sample_Project,Description
1,Lib1,,,,AR006,GCCAAT,GCCAAT,,
2,Lib1,,,,AR006,GCCAAT,GCCAAT,,
1,Lib2,,,,AR008,ACTTGA,ACTTGA,,
2,Lib2,,,,AR008,ACTTGA,ACTTGA,,
1,Lib3,,,,AR012,CTTGTA,CTTGTA,,
2,Lib3,,,,AR012,CTTGTA,CTTGTA,,
And --use-bases-mask Y104N*,I6N8,I6N2Y,46N*
But the only fastq files I'm getting are of the underdetermined reads. So my questions is whether this is real and I didn't get any of my expected reads and I basically only sequenced PhiX or am I incorrectly specifying the SampleSheet and --use-bases-mask parameters.
Thanks a lot
Just want to confirm that your indexes are "inline" (they appear to be) as designed?
The method you are using is for Illumina indexes which are read as a separate read (they are never part of the actual read). In your example above this run was setup as a 150bp paired end run with a 14bp index 1 and 8 bp index 2. So the pair of illumina indexes can only be used to separate your samples (assuming each sample was labeled with two barcodes). After that point you will need to deal with your inline barcodes separately.
Can you post a snippet of the reads from your one of your undetermined reads files? The reads should have what the sequencer read as indexes in the fastq header. They would be concatenated as
index1index2
in one stretch (14+8 or 13+7) bases.Snippets of all four of them, just to be safe.
Lib1 read1:
Lib1 read2:
Lib2 read1:
Lib2 read2:
That change produces this error: std::exception::what: UseBasesMask formatting error. Mask size does not match number of cycles in RunInfo.xml. RunInfo.xml cycles: 150 Base mask:
Please use
ADD REPLY/ADD COMMENT
to provide additional information on existing posts.Edit the
RunInfo.xml
to following and try @Harold's solution again (please save a copy of the original file with a new name first)Your sequences will be contained in R1 and R4 files (and the header inside the file will say 4:N: etc) for file R4.