Entering edit mode
7.7 years ago
jomo018
▴
730
I received 150bp SE Nextseq Fastq files from two similar experiments. Reads from one experiment are structured: 5' primer -- DNA -- 3' primer -- indexed adapter (partial - no index). Reads from the second experiment are structured: DNA-- 3' primer -- indexed adapter (almost complete - index included).
The reads of the first experiment are useless as they do not contain the index within the adapter. What controls the location of the "limited view window" within the sequence?
Index reads (1D/2D) in illumina technology are read independent of main reads and should never be part of actual sequence (unless you are using in-line barcodes). If you have indexed samples then the run has to be set up as such.
I am referring to Truseq indexed adapters, 63 base long and 6 bases in the middle acting as the index.
If you start seeing adapter on 3'-end of a read then that means your inserts are shorter than the length of sequencing being done (http://nextgen.mgh.harvard.edu/IlluminaChemistry.html ).
150 bases are indeed shorter than the complete sequence. Using the colors in the bottom picture of the link you sent: In the second experiment I see the black - purple - blue - yellow (partial) which is OK because I don't need the red and the blue (de-multiplexing index) is included. In the first experiment I see the red - black - purple (partial) which is no good because the blue is missing. So how I can I make sure the next experiment will indeed include the blue index.
To get the "index" the sequencing run has to be set up as "multiplexed". So you will specify 150 bp x N bp (N would be the length of the index you would want) as run-requirement (if you only want single-end reads). Was this not specified the first time around?
I don't know. I have access to the bcl run folder. Can this information be seen in one of the files e.g. RunParameters.xml or RTAConfiguration.xml ?
Look in the
RunInfo.xml
file (should be in the top level FC folder) to see if you can find a block like thisBoth experiments have the same block:
That shows that both these are dual-indexed (2D) runs and should have the same kind of reads. The reads go in the order
Read 1 --> Index 1 --> Index 2 --> Read 2 (which you don't have)
.So you have one sequence file per sample? Can you post the first few lines of the file
(z)cat your_R1_fastq(.gz) | head -8
.Fastq files were extracted with bcl2fastq without SampleSheet. I guess I am not losing any information.
The run including index
The run with two primers and no index
If you were not interested in separating the samples then yes. But if you want to split the samples then you would need to re-run the demultiplexing using a proper samplesheet.
Otherwise this data is a mix of multiple samples which you can't tell apart by sequence you have in hand.
So you are saying that even though the Truseq indexed adapter (with its 6-base index) is missing from the reads as I see them after bcl2fastq-without-SampleSheet, a bcl2fastq with SampleSheet would still be able to do the demultiplexing? Where would it take the index from?
As a side note, I am doing the demultiplexing with a custom script targeting at the 3' primer which is always present and unique.
If you have a few minutes available then watch this short Illumina video (starting about about 2 min in). It will help clarify the order of sequencing I had posted above.
Only reason you may see the indexes in your reads (produced without a sample sheet) is because those inserts were shorter than the length of sequencing. Whoever ran this run should be able to help you get a Samplesheet in right format (in fact it should already be there in the raw folder you have, assuming it is a complete copy, should be called
SampleSheet.csv
).There is no point in using custom scripts for demultiplexing data (unless one is using internal non-illumina barcodes). Doing demultiplexing with
bcl2fastq
is going to make sure that the demux is handled properly.Thank you for pointing to the video. Could you possibly explain how to include a "63 base Truseq indexed adapter with a six base index embedded" within the SampleSheet. I have found no examples for this case and IEM doesn't help. The adapter is:
You don't need to do that. Just include the indexes for the samples. You can generate an example samplesheet from IEM to get general structure and then look at the examples here. There are slightly different samplesheets for CASAVA and bcl2fastq.
Your run data folder may already have a samplesheet (SampleSheet.csv) so look for that first.