I have performed Trimmomatic run for adapter removal. QC report shows drop in the reads and presence of overrepresented sequences. Seeking help!
1
0
Entering edit mode
14 months ago
Vijith ▴ 90

Excuse me for this long post:

I am performing a de novo genome assembly using Illumina paired-end short reads. At present, I am in the stage of trimming the adapters. Here, you can have a look at the basic statistics and information on the adapter content obtained from the Fast QC report, for R1.

Raw Reads

The basic statistics of raw reads given here

The adapter content of the raw reads given here!

I used Trimmomatic for trimming the adapter. The following is the Trimmomatic Settings

ILLUMINACLIP:~/adapters/TruSeq3-PE.fa:2:30:10 MINLEN:36

Below, you can see the basic statistics and adapter content of the Trimmed reads.

**Basic Statistics**

Adapter content

Here, the output was:

Both surviving: 566832403 Forward only surviving: 39244376 Reverse only surviving: 0.00 Dropped reads: <1%

Now following are my questions:

Question 1

Can I go ahead with the assembly process, because there is zero adapter presence in the reads? Should I mind the loss of reads?

Question 2

I see that there are over-represented sequences, both in read 1 and read 2. I doubt if I can leave them be, or if I should trim them too. Can these over-represented sequences be trimmed using Trimmomatic? Can you provide me with suggestions on this?

The following are the over-represented sequences for R1

The following are the over-represented sequences for R1.

The following are the over-represented sequences for R2

The following are the over-represented sequences for R2

NGS illumina WGS • 1.1k views
ADD COMMENT
1
Entering edit mode
14 months ago

G is the letter you get if there is no fluorescence.

75% of your reads have a lot of G.

I don't think this run worked.

ADD COMMENT
0
Entering edit mode

@swbarnes2, thanks for the response, by the way, can you tell me why/ how this happens during the sequencing process, and if it could negatively affect the assembly process?

ADD REPLY
1
Entering edit mode

You should filter out the poly-G reads. As @swbarnes said these represent no signal i.e. no sequence data. While the assembler may be able to ignore these there is no point in leaving them in the input data.

ADD REPLY

Login before adding your answer.

Traffic: 2040 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6