I run fastqc on Illumina fastq files from miseq and found a very high level of sequence duplicate as reported elsewhere. Here, I just want to understand how the total duplicate percentage is calculated. The output from fastqc_data.txt is:
Sequence Duplication Levels fail
Total Duplicate Percentage 90.87717882197605 Duplication Level Relative count
1 100.0
2 17.39881642975786
3 8.231572445683474
4 4.841954142913186
5 2.9630989841995876
6 1.9648267630438956
7 1.3296385019239276
8 0.9081272379744089
9 0.6627325615364712
10++ 3.6083033545619205
Where is 90.87717882197605 coming from? Thank in advance.
http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/3%20Analysis%20Modules/9%20Duplicate%20Sequences.html
Thank you all for the prompt responses.
If you are happy with Istvan's answer, you may choose it as the correct answer. (No obligation, just suggesting in case you have now all the information you were looking for)
Thank, Tony. Yes, I just know the green check now.