I've just started studying RNAseq and I'm confused about how PCR amplication bias leads to duplicates.
I first thought that
- A cDNA is PCR amplified to form a cluster and each cluster results in one read.
- It is possible to PCR amplify the cDNA before forming the cluster if the initial amount is too small, but as it results in bias it is not recommended and is not a routine procedure.
- So duplicate reads are from the cluster formation step. If a cDNA is amplified too much, the cluster can get too big and is identified as two clusters, resulting in duplicate read.
But I searched more and found that it is called optical duplicates and is different from PCR duplicates. I also read somewhere that PCR duplicates are from PCR amplication step before the cluster formation.
My question is,
- Is PCR amplication before cluster formation a routine procedure in RNAseq? How about in single cell RNAseq?
- If all cDNA is amplified before cluster formation, than shouldn't all reads have duplicates? (Because each amplified cDNA -with indentical sequence- will form a seperate cluster and counted as one read?)
Thank you for the reply! I guess I was confused because I thought all cDNAs are read, while actually only part of them are read (random sampling), the amount depending on the chosen sequencing depth.