I have read a few threads here asked by others about the short read duplication levels in raw data produced by sequencing platforms. I also have read Istvan's tutorial on this issue. However, I didn't find an easy striagt forward solution on "what to do then if one have high duplication levels in the filtered WGS Metagenomics data?". This time, I have run the trim-galore to filter my raw dataset and then looked at the fastqc report on my short dataset and it reports that I have almost 79.5% duplication (based on old v0.10.0 version of fastqc). Other reports from fastqc look OK. Some say when abundance is a goal then we just have to ignore the duplication level result from fastqc as same read can come from multiple organisms in a metagenome set. I know for RNAseq many people totally ignore duplication level result from fastqc and move on to downstream analysis. As I'm interested in abundance of certain genes in a metagenome, should I also ignore duplication level result though it sems quite high (79.5%) and move on to abundance analysis of my interested genes? How do you guys handle the high duplication level issue? Do you remove the duplicate (i.e not unique or distinct) sequences?