Question

Trimming Overrepresented Sequences in paired-end RNA-Seq data, as underlined by FastQC.

1

Entering edit mode

7.3 years ago

Shaurya Jauhari ▴ 50

I am analyzing a RNA-Seq paired end sequence data. I have used cutadapt before to trim overrepresented sequences as derived from a fastqc report. However, this time around there is a slight twist in application.

One of the reads I4_R1.fastq has the following attribute,

>>Overrepresented sequences    warn
#Sequence    Count    Percentage    Possible Source
AGATCGGAAGAGCACACGTCTGAACTCCAGTCACGCCAATATCTCGTATG    36771 0.14170337546219017    TruSeq Adapter, Index 6 (100% over 49bp)
GATCGGAAGAGCACACGTCTGAACTCCAGTCACGCCAATATCTCGTATGC    36534 0.14079005518304247    TruSeq Adapter, Index 6 (100% over 50bp)
>>END_MODULE

while the other, I4_R2.fastq has the following:

>>Overrepresented sequences    warn
#Sequence    Count    Percentage    Possible Source
AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCC    37938 0.14620061076077806    Illumina Single End PCR Primer 1 (100% over 50bp)
GATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCG    35122 0.1353486702287956    Illumina Single End PCR Primer 1 (100% over 50bp)
>>END_MODULE

It's hard to figure inputs for "-a" and "-A" as adorned by cutadapt. The forward-end and reverse-end seem indecisive here. On top of everything, can they even use different adapters for the same paired end sequences? Is there any rudimentary flaw with the library preparation that is being highlighted here?

Thanks in advance.

RNA-Seq Paired-End FastQC • 3.9k views

ADD COMMENT • link updated 7.3 years ago by GenoMax 147k • written 7.3 years ago by Shaurya Jauhari ▴ 50

score 4 · Answer 1 · 2017-08-08

4

Entering edit mode

7.3 years ago

GenoMax 147k

Use bbduk.sh from BBMap suite. BBMap contains a list of all commonly used adapter/primer sequences in resources/adapters.fa in BBMap software bundle. You can refer to this file to scan for all contaminants at the same time.

ADD COMMENT • link 7.3 years ago by GenoMax 147k

0

Entering edit mode

Thank you for your reply. Contrarily, is it also usual to have no overrepresented sequences at all.