Or "Everything You Always Wanted to Know About RNA-Seq (But Were Afraid to Ask) Part 1"
Dear all,
I am afraid I am too much of a data analyst and too little wet (never seen a flowcell in my life) to understand the real reasons why the library size varies (even by millions of reads!) between samples sequenced on the same flow cell. Isn't megareads per sample a parameter decided at the machine level? Does this variability only have to do with multiplexing different samples on the same lane or is there more to it (amount of RNA or cDNA, PCR efficiency, sequencer intrinsic features)? Can anyone enlighten me about the different factors that affect this parameter, i.e., sample-specific sequencing depth?
When you make libraries, your samples are in the low nM range. When you load libraries on a sequencer, let's say NextSeq2000, you only load 20 uL of 750 pM. But, you're usually not just loading a single library on the sequencer.
So you dilute your libraries to 2 nM (source of error 1), then you pool them together (source of error 2), then you dilute to 750 pM (source of error 3). Some instruments you have to denature and dilute which adds additional sources of error. You also have different fragment sizes where smaller ones tend to cluster on the flow cell better than larger libraries. So even if you dilute perfectly, you have differences in cluster efficiency between samples that could skew one way or the other.
While all those things are true it is possible to balance the libraries based on a test run on say a MiSeq nano (or perhaps an iSeq), library profiles to identify insert sizes (and a lot of experience with this stuff) to get read numbers that are reasonably balanced or on target (in case you wanted unbalanced pools).
While all those things are true it is possible to balance the libraries based on a test run on say a MiSeq nano (or perhaps an iSeq), library profiles to identify insert sizes (and a lot of experience with this stuff) to get read numbers that are reasonably balanced or on target (in case you wanted unbalanced pools).