high sequence duplication ddRAD
1
0
Entering edit mode
4.3 years ago
gubrins ▴ 350

Hello,

I'm relatively new to NGS analyses. I'm working with single-read ddRAD data of a non-model species and we just obtained the fastq files. The company removed the adaptors and I've just did the demultiplexing and some trimming, to remove the overhangs. When I run FASTQC and MultiQC, I obtain a high degree of duplication (around 80%). I've seen that this could be normal in RNA-seq data, but what about ddRAD? As I just started handling the data I don't think I did something wrong, but I find this high number of duplications really strange. What do you think?

Thanks in advance :) https://ibb.co/Y2LY4VH

Duplication plot

ddrad duplication sequencing SNP • 1.8k views
ADD COMMENT
2
Entering edit mode

Have you checked to see how ddRAD works? If not start here and take a look at some of the papers included in that link.

ADD REPLY
0
Entering edit mode

Thanks for answering. I'm aware of how ddRAD works, but what I don't understand is the pattern I observe in my data. I uploaded a picture, let's see if you can check it.

ADD REPLY
3
Entering edit mode
4.3 years ago

If you understand ddRAD, you will know why there is high duplication rate. The restriction enzymes (double digestion) cut at specific positions in the genome, and your library is enriched for those specific fragments (and size selection + PCR amplification). so you tend to sequence same genomic DNA more often than compared to whole genome sequencing methods (in WGS, the fragmentation of DNA is random, so you sequence random fragments more often).

As suggested, read the relevant papers and check how much duplication is reported and how they deal with it.

ADD COMMENT
0
Entering edit mode

thanks to both of you, obviously I don't understand it as I thought. I'll check the papers!

ADD REPLY

Login before adding your answer.

Traffic: 2360 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6