Hello Community, I just got my ATAC seq results and while doing QC, I found an over-represented seq only in the R2 read and not in the R1 read. After removing my adapter ( Illumina Nextra), I still found that over-represented seq in the R2 read (GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG). It represents only 0.18%. I tried using cutadpt. After processing with cutadapt I still found overrepresented seq in my R2 file with no sequence pattern given and the percentage increased to 0.2%. Moreover, it affected the quality of my sequence length distribution. Does anybody know how to get rid of this sequence and is 0.18% tolerable or I need to remove it ?
Is this sequence at the beginning of your R2 reads or in the middle of the reads. You mentionned cutadpt but you tag Trimmomatic. Both are mainly used to clip the very ends of your reads so if your GGG* sequence is inside your read, you will not be able to do anything.
I would suggest to run your reads alignment and proceed. In the end, your GGG* sequence will not match anything and will be discarded.
Hi, Thanks, it was helpul information. I used fastp to remove the poly g and the QC seems to be fine now.