Read lengths greater than insert length
1
1
Entering edit mode
20 months ago
shpak.max ▴ 50

I have what probably seems like a naive question about what happens if almost all inserts are shorter than the read length of a sequencing cycle.

For instance, the data set that I most recently worked with has average fragment length of 50 bp. Not knowing this was the case, the first rounds of sequencing used 150 cycles. It seems to me that if the fragment is ~50 bp, then the longest read should terminate at 50 pb as well, since the remaining 100 cycles would not generate additional sequence. However, the fastq file gives me reads of length 150, with last ~100 as long runs of the same nucleotide.

How is this last nucleotide added to the read if there is no more length of insert to serve as a template? In case it matters, I'm using the Illumina Novaseq platform.

Sorry if this seems like an uninformed question - perhaps I'm misunderstanding some step of the sequencing process.

NGS • 1.2k views
ADD COMMENT
3
Entering edit mode
20 months ago
GenoMax 147k

Illumina libraries are made by adding adapters to the ends of DNA fragment(s). Those adapters themselves are about 60 nucleotides long. Not all of your library fragments are going to be 50 bp long (there will be a range). So once the sequencer runs out of insert to sequence it will start sequencing into the adapter sequence added at 3'-end of the fragment. Eventually if you continue going then you may start getting sequence that may be completely random. It used to be AAAA etc in old days of GA IIx. Looks like something similar is happening in your case.

Adapter >>>>>-----------------------------<<<<<< Adapter
ADD COMMENT
0
Entering edit mode

I'm not claiming that all fragment lengths are exactly 50, the typical range for my data is 45-60. However, my read lengths are all 150, so that anywhere from 90-105 (usually) are runs of a single nucleotide.

How long is the adapter sequence in most cases? From what you're saying, it must be long enough to account for 100+ bp in most of my data, otherwise the sequencer would run out of template.

ADD REPLY
1
Entering edit mode

As I said above the adapters are each ~60 bp long so they add ~120 bp (5' and 3' end adapters) to the length of the DNA fragment. What you are seeing on the 3' end of sequence is garbage. Once you find the sequence of the adapter used for the library you should remove that and everything else to 3' to the end including the adapter.

As you recall there is a lawn of oligos on Illumina flowcells. Once you run out of adapter on the other end who knows what you are getting.

ADD REPLY

Login before adding your answer.

Traffic: 2455 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6