I just received some sequencing data from WGBS libraries. After QC analysis, I see there are mostly poly-Gs predominating on the read 2. I read that Nova-seq systems classify as poly-G missing reads, so Im guessing these reads are sequenced short fragments, but they predminate most of the sequenced libraries.
I would like to have an extra opinion on the FastQC profiles of these samples, as no one in my lab can give me an expertise opinion on these. Have you faced an QC profile like these?
What would be the most adequate workflow to get useful info from these samples? What i tried was to remove adapters with TrimmGalore, then using Fastp I deactivated every filter, except PolyG and PolyX trimming (If i kept quality and size filters from fastp, i was left with almost no reads). Would it be okay to continue with the analysis? Even after these filtering the Read 2 doesn't seem to have the adequate base content distribution.
This is my first time doing an analysis with real experimental data.
Read 1 Before filtering
Read 2 Before Filtering
Read 1 after filtering and trimming
Read 2 after filtering
Poly-G stretches at end of reads are generally from short fragments where sequence is read through the adapters on 3'-end into oblivion.
Are you sure you did that correctly? Adapter trimming should prove if your reads are from short inserts. If that turns out to be the case then there is not much you can do about it. You may end up having to redo this experiment if the reads are too short to be useful. Do you have a good reference?
The only hint that i have that adapters are removed, is that after running both tools the number of reads and size of the file decreases dramatically. After running TrimmGalore, the size of the file is almost a half of its original size and the same after fastp. And the adapter sequences are not on the overepresented sequences anymore on the fastQC report after running TrimmGalore. Is there another way to check this?
I dont know if there are enormous differences on the settings available for the tool from running it from terminal compared to Galaxy tool. I am using TrimmGalore on Galaxy, and using automatic adaptor detection.
That leads me to believe that you have a lot of short inserts. At this step poly-G's should also be gone. If that is the case then you will need to decide if the leftover reads are adequate for the analysis. If they are too short then they may not longer be usable/useful.