Mostly Poly-Gs on WGBS sequences.
1
0
Entering edit mode
11 months ago
marioperez • 0

I just received some sequencing data from WGBS libraries. After QC analysis, I see there are mostly poly-Gs predominating on the read 2. I read that Nova-seq systems classify as poly-G missing reads, so Im guessing these reads are sequenced short fragments, but they predminate most of the sequenced libraries.

I would like to have an extra opinion on the FastQC profiles of these samples, as no one in my lab can give me an expertise opinion on these. Have you faced an QC profile like these?

What would be the most adequate workflow to get useful info from these samples? What i tried was to remove adapters with TrimmGalore, then using Fastp I deactivated every filter, except PolyG and PolyX trimming (If i kept quality and size filters from fastp, i was left with almost no reads). Would it be okay to continue with the analysis? Even after these filtering the Read 2 doesn't seem to have the adequate base content distribution.

This is my first time doing an analysis with real experimental data.

Read 1 Before filtering

Read1 Before filtering

Read 2 Before Filtering

Read 2 Before filtering

Read 1 after filtering and trimming Read 1 After filter and trimm

Read 2 after filtering Read 2 After filtering

NGS Methylation Poly-G WGBS illumina • 1.3k views
ADD COMMENT
1
Entering edit mode

I read that Nova-seq systems classify as poly-G missing reads, so Im guessing these reads are sequenced short fragments, but they predminate most of the sequenced libraries.

Poly-G stretches at end of reads are generally from short fragments where sequence is read through the adapters on 3'-end into oblivion.

What i tried was to remove adapters with TrimmGalore

Are you sure you did that correctly? Adapter trimming should prove if your reads are from short inserts. If that turns out to be the case then there is not much you can do about it. You may end up having to redo this experiment if the reads are too short to be useful. Do you have a good reference?

ADD REPLY
0
Entering edit mode

The only hint that i have that adapters are removed, is that after running both tools the number of reads and size of the file decreases dramatically. After running TrimmGalore, the size of the file is almost a half of its original size and the same after fastp. And the adapter sequences are not on the overepresented sequences anymore on the fastQC report after running TrimmGalore. Is there another way to check this?

I dont know if there are enormous differences on the settings available for the tool from running it from terminal compared to Galaxy tool. I am using TrimmGalore on Galaxy, and using automatic adaptor detection.

ADD REPLY
0
Entering edit mode

After running TrimmGalore, the size of the file is almost a half of its original size and the same after fastp.

That leads me to believe that you have a lot of short inserts. At this step poly-G's should also be gone. If that is the case then you will need to decide if the leftover reads are adequate for the analysis. If they are too short then they may not longer be usable/useful.

ADD REPLY
0
Entering edit mode
10 months ago
Prash ▴ 280

I encountered a similar thing years ago, what I did was polyG followed by Polyx trimmiing. I am not sure if this is the case with recent version of trimgalore, but fastP has this. Hope this helps, Prash

ADD COMMENT

Login before adding your answer.

Traffic: 1553 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6