Sequence duplication FASTQC - genome sequencing
1
0
Entering edit mode
2.4 years ago
SushiRoll ▴ 140

Hi everyone!

I have received a set of raw reads from the sequencing company and now I am assessing their quality. The run was a 100bp PE Novaseq 6000 on purified E. coli genome, my problem is that I am having a lot of duplicate sequences (See picture below). This happens in each of the 96 samples that I am analysing. The sequence does not contain adapters, even though if it contained them, I wouldn't expect such levels of duplication to be because of them. The FastQC report doesn't list any overrepresented sequence or anything. The only thing that makes sense to me is a technical duplication (as I have already read in other posts) or the fact that the coverage is around 500X. In this last case, I understand that I would naturally get duplicates but I can't explain why only of that size. Can anyone please point me in the right direction? Should I contact the sequencing company?

Sorry if this is a repost, I have been reading a lot of related questions but none of them quite answer my question

Thank you very much!

1

FASTQC WGS DNAseq • 723 views
ADD COMMENT
2
Entering edit mode
2.4 years ago

My best guess would also be duplication due to immense coverage (500x) . That you don't clearly see if from the plot might be a binning issue.

ADD COMMENT
0
Entering edit mode

Hey Lieven!

The binning thing makes sense, I hadn't considered it. I'll just proceed with the rest of the analysis as it is, I honestly didn't want to contact the sequencing facility unless super necessary.

Thanks for your opinion and help!

ADD REPLY

Login before adding your answer.

Traffic: 2574 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6