Question

Do you need to define adapter sequences for trimming and QC tools?

0

Entering edit mode

2.5 years ago

amy__ ▴ 190

Hi,

I have around 70 samples which have undergone WES.

I have a list of the adapters used for each sample; however, it seems that the sequencing company used lots of different adapters for each sample (I.e they all used different adapters). Originally, I was going to use the adapter sequences as an input for fastp and then run it in parallel. However, because the adapter sequences are all different, I don't think I can do this.

Is there a way to just run fastp on default to find adapters or is it good practice to provide each individual adapter sequence?

enter image description here

Thanks! Amy

adapters fastp illumina trimming • 1.8k views

ADD COMMENT • link 2.5 years ago by amy__ ▴ 190

1

Entering edit mode

However, because the adapter sequences are all different, I don't think I can do this.

You absolutely can. In Illumina sequencing there is a core sequence at beginning of adapters as @Istvan showed below. So any adapter sequence is always going to be present on 3'-end of reads (unless you have adapterdimers). Scanning/trimming programs identify this sequence and then trim remaining read 3' of that adapter (including it).

ADD REPLY • link 2.5 years ago by GenoMax 147k

0

Entering edit mode

Thanks GenoMax and @Istvan! So, would it be okay to use a scanning/trimming program without giving it these adapter sequences because they will already look for this core sequence?

Or would you give fastp the adapters for each sample separately?

Thank you for your patience!! Amy

ADD REPLY • link 2.5 years ago by amy__ ▴ 190

1

Entering edit mode

First I would establish that the adapter does indeed exist.

Many adapters are automatically recognized by fastp and reported in the HTML file that gets generated by default. FastQC also recognizes a number of common adapters and shows them in the report.

Run these tools on a few samples and see what these say.

ADD REPLY • link 2.5 years ago by Istvan Albert 102k

1

Entering edit mode

See also the similar posts in the right hand sidebar ---->, for example:

illumina adapter specifying and removing using fastp

ADD REPLY • link 2.5 years ago by Istvan Albert 102k

0

Entering edit mode

Thank you! Will give this a try

ADD REPLY • link 2.5 years ago by amy__ ▴ 190

1

Entering edit mode

I will put a plug in for bbduk.sh from BBMap suite. It is also easy to use and include a full set of commercially available sequences in the adapters.fa file in resources directory in software bundle.

A guide to use bbduk.sh is available here: https://jgi.doe.gov/data-and-tools/software-tools/bbtools/bb-tools-user-guide/bbduk-guide/

ADD REPLY • link 2.5 years ago by GenoMax 147k

0

Entering edit mode

Thank you!!

ADD REPLY • link 2.5 years ago by amy__ ▴ 190

score 2 · Accepted Answer · 2022-05-19

2

Entering edit mode

2.5 years ago

Istvan Albert 102k

If the samples are already split then usually the adapters are also trimmed out. That is the standard operating protocol.

Note how these indices are far into the adapter and there would also be fairly long other adapter sequences present. For example:

CAAGCAGAAGACGGCATACGAGAT[i7]GTCTCGTGGGCTCGG

So even if the adapter were present all you would need is to trim by the start of the sequence CAAGCAGAAGACGGC as that would match all other adapters as well.

ADD COMMENT • link 2.5 years ago by Istvan Albert 102k

1

Entering edit mode

If the samples are already split then usually the adapters are also trimmed out. That is the standard operating protocol.

Not necessarily. Sequencing facilities may simply demultiplex the data and not do any trimming.

ADD REPLY • link 2.5 years ago by GenoMax 147k