Question

I have performed Trimmomatic run for adapter removal. QC report shows drop in the reads and presence of overrepresented sequences. Seeking help!

0

Entering edit mode

14 months ago

Vijith ▴ 90

Excuse me for this long post:

I am performing a de novo genome assembly using Illumina paired-end short reads. At present, I am in the stage of trimming the adapters. Here, you can have a look at the basic statistics and information on the adapter content obtained from the Fast QC report, for R1.

Raw Reads

The basic statistics of raw reads given here

The adapter content of the raw reads given here!

I used Trimmomatic for trimming the adapter. The following is the Trimmomatic Settings

ILLUMINACLIP:~/adapters/TruSeq3-PE.fa:2:30:10 MINLEN:36

Below, you can see the basic statistics and adapter content of the Trimmed reads.

**Basic Statistics**

Here, the output was:

Both surviving: 566832403 Forward only surviving: 39244376 Reverse only surviving: 0.00 Dropped reads: <1%

Now following are my questions:

Question 1

Can I go ahead with the assembly process, because there is zero adapter presence in the reads? Should I mind the loss of reads?

Question 2

I see that there are over-represented sequences, both in read 1 and read 2. I doubt if I can leave them be, or if I should trim them too. Can these over-represented sequences be trimmed using Trimmomatic? Can you provide me with suggestions on this?

The following are the over-represented sequences for R1

The following are the over-represented sequences for R1.

The following are the over-represented sequences for R2

NGS illumina WGS • 1.1k views

ADD COMMENT • link updated 13 months ago by Ram 44k • written 14 months ago by Vijith ▴ 90

score 1 · Answer 1 · 2023-09-25

1

Entering edit mode

14 months ago

swbarnes2 14k

G is the letter you get if there is no fluorescence.

75% of your reads have a lot of G.

I don't think this run worked.

ADD COMMENT • link 14 months ago by swbarnes2 14k

0

Entering edit mode

@swbarnes2, thanks for the response, by the way, can you tell me why/ how this happens during the sequencing process, and if it could negatively affect the assembly process?

ADD REPLY • link 14 months ago by Vijith ▴ 90

1

Entering edit mode

You should filter out the poly-G reads. As @swbarnes said these represent no signal i.e. no sequence data. While the assembler may be able to ignore these there is no point in leaving them in the input data.

ADD REPLY • link 14 months ago by GenoMax 147k