Question

Doubt on removing duplicates on amplicon sequencing data

2

Entering edit mode

6.5 years ago

Picasa ▴ 650

Hi,

I am looking to make a simple SNP analysis.

I have different individuals from which we have targeted specific markers. Then the reads I have come from amplicon sequencing. My questions are:

1) Do I have to remove duplicates ? From what I understand, tools like Picard look for the same 5', but by definition, amplicon sequencing reads start by the same position?

2) If no: how can I treat these data, because if if an error is propagate during the PCR, it will be a bad call at the end ?

Edit: 3) There are 2 type of duplicates: optical and pcr, in that case do I have to remove only optical duplicates ? if yes, do you know how ? seems that Picard doest not separate optical and pcr.

Thanks for your help.

duplicates amplicon • 4.6k views

ADD COMMENT • link updated 6.5 years ago by Devon Ryan 104k • written 6.5 years ago by Picasa ▴ 650

score 2 · Answer 1 · 2018-05-17

2

Entering edit mode

6.5 years ago

Devon Ryan 104k

No for the reasons you listed.
Correct, that's the down-side to amplicons (unless you put UMIs on your PCR primers).
Removing optical duplicates can be done with clumpify from BBTools. However, this doesn't end up working that well for amplicons unless you spiked in a lot of PhiX or had a very large number of amplicons on the same lane. Otherwise you end up overly removing sequence (not that this ends up being a huge problem).

ADD COMMENT • link 6.5 years ago by Devon Ryan 104k

0

Entering edit mode

Thanks Devon, I have edited my post with another question. Maybe you have not seen it:

3) There are 2 type of duplicates: optical and pcr, in that case do I have to remove only optical duplicates ? if yes, do you know how ? seems that Picard doest not separate optical and pcr.

ADD REPLY • link 6.5 years ago by Picasa ▴ 650

1

Entering edit mode

I just edited my response accordingly.

ADD REPLY • link 6.5 years ago by Devon Ryan 104k

0

Entering edit mode

Would you have to remove duplicates, when comparing abundance of two transcript isoforms of a gene?

ADD REPLY • link 4.1 years ago by caggtaagtat ★ 1.9k

1

Entering edit mode

Depends on how badly affected they are, in general if the transcripts are highly enough expressed you're going to start having false-positive duplicates, so it's best to avoid that unless you really need to.

ADD REPLY • link 4.1 years ago by Devon Ryan 104k