I have what probably seems like a naive question about what happens if almost all inserts are shorter than the read length of a sequencing cycle.
For instance, the data set that I most recently worked with has average fragment length of 50 bp. Not knowing this was the case, the first rounds of sequencing used 150 cycles. It seems to me that if the fragment is ~50 bp, then the longest read should terminate at 50 pb as well, since the remaining 100 cycles would not generate additional sequence. However, the fastq file gives me reads of length 150, with last ~100 as long runs of the same nucleotide.
How is this last nucleotide added to the read if there is no more length of insert to serve as a template? In case it matters, I'm using the Illumina Novaseq platform.
Sorry if this seems like an uninformed question - perhaps I'm misunderstanding some step of the sequencing process.
I'm not claiming that all fragment lengths are exactly 50, the typical range for my data is 45-60. However, my read lengths are all 150, so that anywhere from 90-105 (usually) are runs of a single nucleotide.
How long is the adapter sequence in most cases? From what you're saying, it must be long enough to account for 100+ bp in most of my data, otherwise the sequencer would run out of template.
As I said above the adapters are each ~60 bp long so they add ~120 bp (5' and 3' end adapters) to the length of the DNA fragment. What you are seeing on the 3' end of sequence is garbage. Once you find the sequence of the adapter used for the library you should remove that and everything else to 3' to the end including the adapter.
As you recall there is a lawn of oligos on Illumina flowcells. Once you run out of adapter on the other end who knows what you are getting.