I am analyzing a RNA-Seq paired end sequence data. I have used cutadapt before to trim overrepresented sequences as derived from a fastqc report. However, this time around there is a slight twist in application.
One of the reads I4_R1.fastq has the following attribute,
>>Overrepresented sequences warn
#Sequence Count Percentage Possible Source
AGATCGGAAGAGCACACGTCTGAACTCCAGTCACGCCAATATCTCGTATG 36771 0.14170337546219017 TruSeq Adapter, Index 6 (100% over 49bp)
GATCGGAAGAGCACACGTCTGAACTCCAGTCACGCCAATATCTCGTATGC 36534 0.14079005518304247 TruSeq Adapter, Index 6 (100% over 50bp)
>>END_MODULE
while the other, I4_R2.fastq has the following:
>>Overrepresented sequences warn
#Sequence Count Percentage Possible Source
AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCC 37938 0.14620061076077806 Illumina Single End PCR Primer 1 (100% over 50bp)
GATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCG 35122 0.1353486702287956 Illumina Single End PCR Primer 1 (100% over 50bp)
>>END_MODULE
It's hard to figure inputs for "-a" and "-A" as adorned by cutadapt. The forward-end and reverse-end seem indecisive here. On top of everything, can they even use different adapters for the same paired end sequences? Is there any rudimentary flaw with the library preparation that is being highlighted here?
Thanks in advance.
Thank you for your reply. Contrarily, is it also usual to have no overrepresented sequences at all.
Depends on the dataset/experiment.