Question

Identify adapter sequences for trimming from Illumina paired end fastq files

1

Entering edit mode

6.4 years ago

mohammedtoufiq91 ▴ 260

Hi,

I am working with the Illumina paired end unaligned data. I would like to initially identify the adapter sequences present in the data, and trim the reads accordingly. Is there are a way to identify the adapter sequences. Please assist me with this and let me know the tools to use.

Thank you, Toufiq

RNA-Seq Adapter trimming QC Fastq • 11k views

ADD COMMENT • link updated 6.4 years ago by benformatics 4.1k • written 6.4 years ago by mohammedtoufiq91 ▴ 260

score 5 · Answer 1 · 2019-02-14

5

Entering edit mode

6.4 years ago

GenoMax 152k

Use BBMap suite (reproduced from here) :

If you have paired reads, and enough of the reads have inserts shorter than read length, you can identify adapter sequences with BBMerge, like this (they will be printed to adapters.fa):

bbmerge.sh in1=r1.fq in2=r2.fq outa=adapters.fa

You can find the adapter sequence used in the adapters.fa file included with BBMap. In that case, you can do this:

bbduk.sh in1=r1.fq in2=r2.fq k=23 ref=adapters.fa stats=stats.txt

stats.txt will then list the names of adapter sequences found, and their frequency.

ADD COMMENT • link 6.4 years ago by GenoMax 152k

0

Entering edit mode

Thank you. I was able to identify the adapters in R1.fq and R2.fq. Now, I would like to know if these are 5' forward/reverse or 3' forward/reverse. Is there are way to identify.

ADD REPLY • link 6.4 years ago by mohammedtoufiq91 ▴ 260

score 0 · Answer 2 · 2019-02-14

0

Entering edit mode

6.4 years ago

benformatics 4.1k

fastp is a new tool that is almost as fast as bbduk but has implemented methods that automatically detect 5' or 3' adapters for both paired (must be manually enabled) and single-end data.

the adapters are evaluated by analyzing the tails of first ~1M reads

So if you have more complicated or multiple adapters this may not be ideal.

ADD COMMENT • link 6.4 years ago by benformatics 4.1k

0

Entering edit mode

Thank you. I ran this program, however, did not find any specific adapter.

./fastp -i <input1> -I <input2> -o R1.fastq.gz -O R2.fastq.gz --disable_adapter_trimming --detect_adapter_for_pe --html Report_sample.html

In the .html file, this only reports Duplication rate Insert size estimaion Before/after filtering read quality Before/after filtering base content Before/after kmer counting

ADD REPLY • link 6.4 years ago by mohammedtoufiq91 ▴ 260

1

Entering edit mode

If you use "--disable_adapter_trimming" then it does not search for adapters...

ADD REPLY • link 6.4 years ago by benformatics 4.1k

0

Entering edit mode

Thank you. Another question, is it recommended to trim the adapters for the Illumina Paired end data with 150*2 bp

ADD REPLY • link 6.4 years ago by mohammedtoufiq91 ▴ 260

2

Entering edit mode

If they are present they should be trimmed especially if you are going to do any de novo work with your data.

ADD REPLY • link 6.4 years ago by GenoMax 152k