PCR duplicata and de novo assembly ??
2
0
Entering edit mode
8.1 years ago
Picasa ▴ 650

Hi,

Sorry I am a beginner in NGS field I used SGA preqc to assess the quality of my data.

This program estimates the PCR duplicate proportion to 20%.

1) First, I am not sure to understant what is PCR duplicate, what is the difference with the sequencing coverage ?

2) 20%: Is it a lot ?

3) If yes, how to remove them ? is it important prior to a de novo assembly ?

pcr duplicates preqc • 2.7k views
ADD COMMENT
1
Entering edit mode
8.1 years ago
Medhat 9.8k

There is two types of duplication PCR duplication and optical duplication, we remove duplicates mainly to reduce recurrent errors.

  • PCR duplication are introduced during library preparation

    you can find nice info about it here also

  • optical duplicates (Illumina) are obtained when a single cluster of reads is part of two adjacent tiles' on the same slide and used to compute two read calls separately

here you can use this tool to remove them
Note: while using the tool I suggested

To remove the duplicate records from the resulting file, set the REMOVE_DUPLICATES parameter to true

ADD COMMENT
0
Entering edit mode

Thanks for your support.

1) Did you have any good experience with removing those duplicates ? (your N50 improved a lot )

2) Should I remove it on raw reads or trimmed reads ?

ADD REPLY
0
Entering edit mode
  • from the word duplication leaving them will not change N50 "cause actually they are the same no information will be introduced by leaving them" on the other hand it will introduce bias in downstream analysis.
  • first you trimme your reads then remove them
ADD REPLY
0
Entering edit mode
8.1 years ago
Satyajeet Khare ★ 1.6k

20% duplicates are not a lot. And I think for assembly they should not matter much. These are duplicate reads in your sample. For quantitative analyses such as ChIP-seq and RNA-Seq, they can pose a big problem.

ADD COMMENT

Login before adding your answer.

Traffic: 1604 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6