Hi,
I am looking to make a simple SNP analysis.
I have different individuals from which we have targeted specific markers. Then the reads I have come from amplicon sequencing. My questions are:
1) Do I have to remove duplicates ? From what I understand, tools like Picard look for the same 5', but by definition, amplicon sequencing reads start by the same position?
2) If no: how can I treat these data, because if if an error is propagate during the PCR, it will be a bad call at the end ?
Edit: 3) There are 2 type of duplicates: optical and pcr, in that case do I have to remove only optical duplicates ? if yes, do you know how ? seems that Picard doest not separate optical and pcr.
Thanks for your help.
Thanks Devon, I have edited my post with another question. Maybe you have not seen it:
3) There are 2 type of duplicates: optical and pcr, in that case do I have to remove only optical duplicates ? if yes, do you know how ? seems that Picard doest not separate optical and pcr.
I just edited my response accordingly.
Would you have to remove duplicates, when comparing abundance of two transcript isoforms of a gene?
Depends on how badly affected they are, in general if the transcripts are highly enough expressed you're going to start having false-positive duplicates, so it's best to avoid that unless you really need to.