Question

[closed]Trimming off adaptor sequences or removing reads aligned to adaptors

0

Entering edit mode

4.1 years ago

Yifeng • 0

It seems that I have misunderstood the paper (Awkward).

For NGS gDNA sequencing, assembly/mapping, I normally trim off the adaptor sequences when they are present. This paper (https://www.nature.com/articles/nsmb.2660 ), removed all the reads that aligned to adaptors.

I am confused and wondering the motivation of removing all the reads that aligned to adaptors. Apparently, it gives low read counts after QC.

I am not sure if read length/platform/library determine removal just adaptor sequences or the reads. Could anyone advise please? Thank you.

sequencing next-gen Assembly • 1.2k views

ADD COMMENT • link 4.1 years ago by Yifeng • 0

1

Entering edit mode

That is pretty unconventional to do. What if the adapter content is like 10 basepairs on a 100bp read. No aligner in the world would align these 10bp, even in local mode, it is just too short. That only would make sense if the entire read was basically an adapter but this should actually not happen in a proper library. I would simply trim them off as everyone else in the world does. If after trimming the reads are short then discard it, most trimmers have a minimum-length option.

ADD REPLY • link 4.1 years ago by ATpoint 85k

0

Entering edit mode

~~Where does it say in this paper that they removed all reads that aligned to adaptors?~~ I briefly read through the methods section and there is nothing about this there. Edit: This information was further down in a different section as pointed out by OP.

ADD REPLY • link 4.1 years ago by GenoMax 147k

0

Entering edit mode

I pasted part of the method below.

"Deep sequencing and quality control (QC). The libraries were sequenced on the Illumina HiSeq 2000 platform using the 100-bp single-end sequencing strategy.

In total, we generated 438 Gb (raw data) for 124 single-cell cDNA samples. The original image data generated by the sequencing machine were converted into sequence data via base calling (Illumina pipeline CASAVA v1.8.0) and then subjected to standard QC criteria to remove all of the reads that fit any of the following parameters:

1 The reads that aligned to adaptors or primers with no more than two mismatches.

2 The reads with more than 10% unknown bases (N bases).

3The reads with more than 50% of low-quality bases (quality value ≤ 5) in one read.

... ... " Hope this help.

ADD REPLY • link 4.1 years ago by Yifeng • 0

1

Entering edit mode

To have an entire read match adaptor/primer sequence with no more than 2 mismatches indicates that these reads were likely from primer dimers or from reads with very short inserts representing read through. So that is what a scan/trim program would normally do.

Finally, 371.9 Gb (84.9%) of filtered reads were left for further analysis after QC. From these 371.9 Gb of filtered reads, 352.2 Gb (80.4%) of data were mapped to the RefSeq, Ensembl, lncRNA and hg19 reference databases

That is not low number of reads left after QC.

ADD REPLY • link 4.1 years ago by GenoMax 147k

0

Entering edit mode

Thanks. I got you. I misunderstood the paper. I thought it meant that the alignment has no more than two mismatches.

ADD REPLY • link 4.1 years ago by Yifeng • 0

0

Entering edit mode

You understood it right. Alignment has "no more than two mismatches" means that the read matched perfectly at remaining N-2 bases with primer/adapter sequence.

ADD REPLY • link 4.1 years ago by GenoMax 147k

score 0 · Answer 1 · 2020-10-28

0

Entering edit mode

4.1 years ago

JC 13k

As you said, it's common to trim adaptors, not to completely remove reads if they align. I'm not sure for the mentioned paper (it is not Open) but definitively this will remove reads that can be a bias in the analysis. BTW some techs (illumina) efficiently removes adapters after base calling, I generally just check if the library needs trimming, in the past year I only trim data from non-illumina vendors.

ADD COMMENT • link 4.1 years ago by JC 13k

0

Entering edit mode

@JC you should be able to find the paper via your institution. The link above has been fixed.

ADD REPLY • link 4.1 years ago by GenoMax 147k