Hi everyone!
I have received a set of raw reads from the sequencing company and now I am assessing their quality. The run was a 100bp PE Novaseq 6000 on purified E. coli genome, my problem is that I am having a lot of duplicate sequences (See picture below). This happens in each of the 96 samples that I am analysing. The sequence does not contain adapters, even though if it contained them, I wouldn't expect such levels of duplication to be because of them. The FastQC report doesn't list any overrepresented sequence or anything. The only thing that makes sense to me is a technical duplication (as I have already read in other posts) or the fact that the coverage is around 500X. In this last case, I understand that I would naturally get duplicates but I can't explain why only of that size. Can anyone please point me in the right direction? Should I contact the sequencing company?
Sorry if this is a repost, I have been reading a lot of related questions but none of them quite answer my question
Thank you very much!
Hey Lieven!
The binning thing makes sense, I hadn't considered it. I'll just proceed with the rest of the analysis as it is, I honestly didn't want to contact the sequencing facility unless super necessary.
Thanks for your opinion and help!