Question

adapter sequence positions

1

Entering edit mode

4.1 years ago

NGSCanBioinf ▴ 10

Hi, from my understanding the adapter sequences can only occur on the 3' side of the sequencing reads if there is a read through. Does this mean that when you open a Fastq file (either R1 or R2) to inspect, you should see the majority of them on the right side of the file? In my Fastq files the majority of the adapter sequences that I see are on the left side and they correspond to exactly half of the sequence followed by 6,7 "A"'s, and the rest of the sequence seem to have a lower quality based on the quality values. The trimmomatic program seem to be removing them but I am curious about their source. Thanks for your help.

sequence • 1.2k views

ADD COMMENT • link updated 4.1 years ago by JC 13k • written 4.1 years ago by NGSCanBioinf ▴ 10

score 1 · Answer 1 · 2020-10-06

1

Entering edit mode

4.1 years ago

JC 13k

It depends on the method to generate the library, but in general, for "random" selection, you can expect adapter sequences in both sides except that the latest sequencing methods can efficiently remove them.

ADD COMMENT • link 4.1 years ago by JC 13k

score 1 · Answer 2 · 2020-10-06

1

Entering edit mode

4.1 years ago

GenoMax 147k

That is strange indeed unless you are using some sort of a custom adapter scheme where there is a sequences at the beginning of all of your reads before the actual insert sequence starts. Otherwise there is some sort of mispriming during sequencing and what you are seeing is sequences from the slide adapters.

Do you know if every read in your dataset has these?

ADD COMMENT • link 4.1 years ago by GenoMax 147k

0

Entering edit mode

Thank you, these are WGS samples, and the overrepresented sequence is identified as "TruSeq Adapter, Index 18" by Fastq which reports 101,779 of them (i.e. 0.16%) in the file. From rough visual inspection, 90% of these have the adapter sequence in the beginning (i.e. left side) of the sequence. Trimmomatic removes the entire 151 bases of these sequences.

ADD REPLY • link 4.1 years ago by NGSCanBioinf ▴ 10

0

Entering edit mode

90% of these have the adapter sequence in the beginning (i.e. left side) of the sequence.

Are your reads reverse complemented by any chance? My hunch is they are, if 90% of reads have adapters at the beginning. I suggest you reverse complement them (use reformat.sh from BBMap suite) and check.

You can also use bbduk.sh with ktrim=l option with the data as is. This should only remove that part of the read which is adapter on the left end of the read. A guide for bbduk.sh is available here.

ADD REPLY • link 4.1 years ago by GenoMax 147k

0

Entering edit mode

Thanks, these are good points. I will look into the scripts you mentioned. We have received these files from collaborators so may need to check with them, but in general is it recommended to check if the reads are reverse complemented for any fastq file received from sequencing centres? and if they are, would the current aligners (e.g bwa-mem) know how to handle them?

ADD REPLY • link 4.1 years ago by NGSCanBioinf ▴ 10

1

Entering edit mode

Aligners should handle the reads fine. You should not need to even trim them, if you are aligning to a reference.

Not sure why they would have been released as RC. Sequencing centers generally do not do that though bcl2fastq (Illumina's software for pre-processing of data) does have the option to generate RC version of reads.