Dear all,
I'm recently working on some sequencing data generated by cell-free DNA, and I found that the read duplication rate of the sequencing data is significantly higher compared to normal WGS data sequenced using genomic DNA (12~% vs 1~%). The read length of my data is paired 100bp.
I wonder if the insert size of the cell-free DNA (in my data, about 150bp) causes this phenomenon because compared to normal WGS data, the insert sizes are usually over 200bp.
So in theory, there will be a 'gap' in genomic DNA sequencing, because the paired reads from both sides can't meet at the center of the DNA sequence (genomic DNA insert size > 200bp). But in cell-free DNA, the reads may overlap at the center of the DNA sequence (cfDNA insert size < 200bp).
I think, the actual read length of two sequencing materials are different, and I've found a picture supporting my theory. However, the source is no where to find.
So can anyone help me with this? Just curious.
Thank you for your answer! So more PCR will be done for cfDNA, in that case, more duplication is reasonable.