FastQC duplicate sequence plot for targeted sequencing libraries
0
0
Entering edit mode
4.5 years ago
CL ▴ 40

To my understanding, targeted sequencing is an enriched library. However, it is not clear to me if the duplicate sequence plot from FastQC is actually biased for this type of sequencing. I am aware that for RNA-seq libraries duplication events are expected, as these will account for highly expressed genes. However, if we are sequencing a specific set of regions and amplifying them several times, I should as well expect this same pattern. Am I right? Moreover, most of the sequences should fall into the region of more than 1 copy?

I have found some explanation where it was mentioned that "In both the raw and deduplicated versions of the library the vast majority of reads come from sequences which only occur once within the library- this will be true for Whole-Genome sequencing for example, suggesting that there is a diverse population". However, for targeted capture sequencing, we do not have a diverse library so we expect high duplication levels?

Thanks.

sequencing sequence • 1.0k views
ADD COMMENT
0
Entering edit mode

Duplicates are normal and expected in targeted experiments. I personally do not often perform targeted sequencing, but I hear experienced people say to leave duplicated untouched as the false-negative rate after removing duplicates does not justify the reduction in false-positives.

ADD REPLY
0
Entering edit mode

In my experience with RRBS, which is an enriched sequencing (in this case with bisulfite), high levels of duplication are usually observed and not removed. ("High levels" such as for example, the sequence duplication levels plot in FastQC may often say that the "percent of seqs remaining if deduplicated" is less than 20-30%)

ADD REPLY

Login before adding your answer.

Traffic: 1769 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6