what cause high sequence duplication levels in Miseq?
2
1
Entering edit mode
9.8 years ago
sckinta ▴ 730

Hi there,

I have 30 samples (3 replicates * 10 conditions) running Miseq experiment to evaluate library quality. fastqc reported 2 (from the same condition) out of 30 have high sequence duplication levels (obviously outliners). I went back to check RNA quality and read abundance in those two samples, nothing weird. After mapping those two to reference transcriptome, no difference on alignment percentage from others.

Then what make those two libraries specially high sequence duplication? Should I re-make library before proceed to Hiseq?

RNA-Seq • 3.5k views
ADD COMMENT
2
Entering edit mode
9.8 years ago
Asaf 10k

There are some factors that can generate duplicated reads. Some factors that I ran into:

  • A redundant RNA (ssrA in my case in E. coli ~ 1.5% of total reads)
  • A lot of PCR
  • Adapters (primer-dimer)

Try to see which are the redundant sequences, are they genomic? How frequent are they? How many cycles of PCR did you do? How did you select your RNA? (poly-A, ribo-depleted, sRNAs etc.)

ADD COMMENT
2
Entering edit mode
9.8 years ago
Irsan ★ 7.8k

Are you really sure that the 2 samples with high duplication levels have way more duplication than the others or are they just above an arbitrary threshold that was used to report duplication? Do they have comparable mapping percentages also when only looking at exonic regions? In that case you can assume that the duplicated reads come from RNA molecules and it doesn't have anything to do with the MiSeq technology specifically. It can be that these two samples have very high expression of particular genes. It could also be that you had only few input material available for these samples causing low library complexity and high duplication levels

ADD COMMENT

Login before adding your answer.

Traffic: 1848 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6