Question

How to interpret that overrepresented and duplicate sequences influence %GC content?

0

Entering edit mode

4 months ago

tvibhaps ▴ 10

I have a Fastqc and multiqc report of bunch of bulk RNA-seq (PE) data, and each of 80 Fastqc report of both raw and trimmed (Trimmomatic) shows- overrepresented sequence of (2%-6%) with possible source as TrueSeq adapters in 6 of 160 reads (2 are both pairs, 2 are either forward only). each of the read in raw and trimmed both. Duplication sequence ~45-50% is each of the read in raw and trimmed both. clipping did not work. %GC content mean value 51% has a sharper peak than theoretical distribution in all 160 raw and 154 trimmed. (FASTQC put them 154 in Warning and 6 in failed category for gc content) Adapter content is there in raw, but trimming removed it.

Fastqc/multiqc report shows that trimming of reads caused some changes in GC contents that is evident in double spikes in 6/160 trimmed read. I am sharing the 1 out of those 6 trimmed (and corresponding raw) reads that is showing failed at gc content, enter image description here Adapter content in raw

enter image description here gc content in raw

enter image description here

gc content in trimmed

Overrepresented reads are 6.56% TruSeq Adapter, Index 6 (97% over 37bp) for this trimmed and raw both.

However, Quality score is very good lies between 30-40 median range for all 160 raw and trimmed both.

How to interpret the spikes of GC content? I have explored several fastqc interpretation by its author and related fraternity discussions but could not conclude to go ahead for mapping onwards. Should i ignore the spikes and overrepresentations since sequence quality is good? or else. I appreciate for your time and suggestions.

Duplicates FASTQC MultiQC Fastq GC • 642 views

ADD COMMENT • link 4 months ago by tvibhaps ▴ 10

1

Entering edit mode

that is evident in double spikes in 6/160 trimmed read

Check to see if those samples contain rRNA. It has a different GC content compared to other genes.

Having some samples "fail" FastQC criteria does not make them automatically bad. There is also no rule that says you can't move forward with the analysis. If you notice any strangeness with PCA etc after you do the counting and start basic analysis then consider whether to backtrack and investigate or drop the outlier samples (if it is justified).