Question

Sequence Duplication Levels failed FastQC Report

1

Entering edit mode

2.8 years ago

onca ▴ 10

Hi all,

I'm checking quality for my RNA-Seq through FastQC and all my fastq failed on "Per base sequence content" and "Sequence Duplication Levels", besides warning on "Overrepresented sequences" only for read 1 files (it's paired-end; the sequences match between samples). Below is an example, but it's very similar across all fastq.

16_1_perBaseSequenceContent

16_1_sequenceDuplicationLevels

16_1_overrepresentedSequences

Can you give me any clue about the possible causes or how to investigate them?

Importante note: it's DNBseq (BGI sequencer).

Thank you,

sequenceDuplication qualityCheck fastqc rna-seq • 4.0k views

ADD COMMENT • link updated 2.8 years ago by GenoMax 148k • written 2.8 years ago by onca ▴ 10

score 1 · Answer 1 · 2022-03-22

1

Entering edit mode

2.8 years ago

GenoMax 148k

Please see informative blog posts about these topics by authors of FastQC here: https://sequencing.qcfail.com/software/fastqc/

Specifically https://sequencing.qcfail.com/articles/libraries-can-contain-technical-duplication/ and https://sequencing.qcfail.com/articles/positional-sequence-bias-in-random-primed-libraries/

ADD COMMENT • link 2.8 years ago by GenoMax 148k

0

Entering edit mode

For "Sequence Duplication Levels", I will try to plot duplication vs read density to check for technical duplication, thank you!

But for "Per base sequence content", what's bothering me is not the biased sequence at the beginning, but the separation between G/C and A/T proportions. Could it reflect duplicated sequences too?

ADD REPLY • link 2.8 years ago by onca ▴ 10

0

Entering edit mode

Separation between the G/C and A/T is possible because your organism may have GC rich exons. They could also reflect rRNA, if they were not completely eliminated.

ADD REPLY • link 2.8 years ago by GenoMax 148k

0

Entering edit mode

But in case of GC rich exons, "Per sequence GC content" should fail too, right? This is not the case for any of my fastq files...

To check for rRNA, could I blast the overrepresented sequences? I have read about adapter dimers too, do you know how could I check this?

16_1_perSequenceGCContent

ADD REPLY • link 2.8 years ago by onca ▴ 10

0

Entering edit mode

None of the failures on FastQC prevent you from proceeding with the data analysis. In fact you should do so. If there are issues downstream (e.g. alignment % looks bad, you are not able to assign counts to gene etc) then backtrack and try to investigate the causes of why that may be happening.

ADD REPLY • link 2.8 years ago by GenoMax 148k