Overrepresented sequence in reverse read
1
0
Entering edit mode
5 months ago

Hello Community, I just got my ATAC seq results and while doing QC, I found an over-represented seq only in the R2 read and not in the R1 read. After removing my adapter ( Illumina Nextra), I still found that over-represented seq in the R2 read (GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG). It represents only 0.18%. I tried using cutadpt. After processing with cutadapt I still found overrepresented seq in my R2 file with no sequence pattern given and the percentage increased to 0.2%. Moreover, it affected the quality of my sequence length distribution. Does anybody know how to get rid of this sequence and is 0.18% tolerable or I need to remove it ?

FastQC Genomics Trimmomatic Sequencing ATAC • 647 views
ADD COMMENT
1
Entering edit mode

Is this sequence at the beginning of your R2 reads or in the middle of the reads. You mentionned cutadpt but you tag Trimmomatic. Both are mainly used to clip the very ends of your reads so if your GGG* sequence is inside your read, you will not be able to do anything.

I would suggest to run your reads alignment and proceed. In the end, your GGG* sequence will not match anything and will be discarded.

ADD REPLY
0
Entering edit mode

Hi, Thanks, it was helpul information. I used fastp to remove the poly g and the QC seems to be fine now.

ADD REPLY
2
Entering edit mode
5 months ago
Umer ▴ 130

you can read here https://sequencing.qcfail.com/articles/illumina-2-colour-chemistry-can-overcall-high-confidence-g-bases/ about the reason why poly-G string is present in data. To remove it, fastp has an option --trim_poly_g which will trim it.

ADD COMMENT
0
Entering edit mode

Thank You so much....it was really helpful...i was able to remove the poly G with fastp

ADD REPLY

Login before adding your answer.

Traffic: 1508 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6