Question

How do I completely get rid of adapter sequence

2

Entering edit mode

7.0 years ago

MAPK ★ 2.1k

I have single-end smallRNAseq data and I have to trim the adapter (adapter sequence: AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC). After trimming the adapter using the bbduk command mentioned below, I checked the reads in my data and they still have the overhangs of the adapter as shown below.

The adapter overhangs can be seen below:

cat bbduk_trimmed_small_RNA_001.fastq | grep TCGCAGGGAAATCATCTGATTA

TCGCAGGGAAATCATCTGATTAGATCGGAAGA
TCGCAGGGAAATCATCTGATTAGATCGGAA
TCGCAGGGAAATCATCTGATTAAGATCGGAAGAA
TTCGCAGGGAAATCATCTGATTAGAA

or can be seen here: https://postimg.cc/image/knuvtyv0d/

This trimming was done using bbduk.sh (bbmap tools: https://jgi.doe.gov/data-and-tools/bbtools/bb-tools-user-guide/bbduk-guide/). Here NEB-SE.fa file has the adapter sequence mentioned above.

bbduk.sh -Xmx1g in=small_RNA_001.fastq out=bbduk_trimmed_small_RNA_001.fastq ref=NEB-SE.fa ktrim=r k=13 mink=6 minlength=18 hdist=0

I could reduce the kmer size k=4, but that would risk into trimming the false positives. How can I completely get rid of these adapter sequences from my data without trimming false positives?

smallRNAseq trimming adapter • 2.5k views

ADD COMMENT • link 7.0 years ago by MAPK ★ 2.1k

score 4 · Accepted Answer · 2018-07-29

4

Entering edit mode

7.0 years ago

GenoMax 153k

You can't have it both ways. You can either be strict about trimming and risk losing a few reads or have a few reads left that may have a bit of adapter. Your aligner should deal with any extraneous bases while it does the alignment. Have you tried to align this data to see what fractions aligns? With smallRNA you are looking for a specific length (~21-25 bp). If you reads are failing that length criteria after trimming you may want to discard them since they may not be what you are looking for.

BTW: I am curious as to what the original reads in the example you included look like (or is that the full original sequence)?

ADD COMMENT • link 7.0 years ago by GenoMax 153k

0

Entering edit mode

Yes, that's the full original sequence in the example. I had separated my data into aligned fastq and unaligned fastq files and randomly selected that read from aligned fasq file to look into unaligned fastq dataset. Turns the unaligned dataset also has this read but with adapter overhangs.

ADD REPLY • link 7.0 years ago by MAPK ★ 2.1k

0

Entering edit mode

I did align the dataset using bowtie with -v 1 option which allows for one mismatch. Is that what you were referring to when you said "Your aligner should deal with any extraneous bases while it does the alignment."?

ADD REPLY • link 7.0 years ago by MAPK ★ 2.1k

0

Entering edit mode

It would appear that bowtie is not able to soft-clip the adapter sequences.

Since you have BBTools installed can you align your data with bbmap.sh using ambig=all vslow perfectmode maxsites=1000 (@Brian recommended these parameters with this note: It should be very fast in that mode (despite the vslow flag). Vslow mainly removes masking of low-complexity repetitive kmers, which is not usually a problem but can be with extremely short sequences like microRNAs)

ADD REPLY • link 7.0 years ago by GenoMax 153k