Question

Sequence duplication FASTQC - genome sequencing

0

Entering edit mode

2.4 years ago

SushiRoll ▴ 140

Hi everyone!

I have received a set of raw reads from the sequencing company and now I am assessing their quality. The run was a 100bp PE Novaseq 6000 on purified E. coli genome, my problem is that I am having a lot of duplicate sequences (See picture below). This happens in each of the 96 samples that I am analysing. The sequence does not contain adapters, even though if it contained them, I wouldn't expect such levels of duplication to be because of them. The FastQC report doesn't list any overrepresented sequence or anything. The only thing that makes sense to me is a technical duplication (as I have already read in other posts) or the fact that the coverage is around 500X. In this last case, I understand that I would naturally get duplicates but I can't explain why only of that size. Can anyone please point me in the right direction? Should I contact the sequencing company?

Sorry if this is a repost, I have been reading a lot of related questions but none of them quite answer my question

Thank you very much!

FASTQC WGS DNAseq • 723 views

ADD COMMENT • link updated 2.4 years ago by lieven.sterck 15k • written 2.4 years ago by SushiRoll ▴ 140

score 2 · Accepted Answer · 2022-07-12

2

Entering edit mode

2.4 years ago

lieven.sterck 15k

My best guess would also be duplication due to immense coverage (500x) . That you don't clearly see if from the plot might be a binning issue.

ADD COMMENT • link 2.4 years ago by lieven.sterck 15k

0

Entering edit mode

Hey Lieven!

The binning thing makes sense, I hadn't considered it. I'll just proceed with the rest of the analysis as it is, I honestly didn't want to contact the sequencing facility unless super necessary.

Thanks for your opinion and help!

ADD REPLY • link 2.4 years ago by SushiRoll ▴ 140