I have single-end smallRNAseq data and I have to trim the adapter (adapter sequence: AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC
). After trimming the adapter using the bbduk command mentioned below, I checked the reads in my data and they still have the overhangs of the adapter as shown below.
The adapter overhangs can be seen below:
cat bbduk_trimmed_small_RNA_001.fastq | grep TCGCAGGGAAATCATCTGATTA
TCGCAGGGAAATCATCTGATTAGATCGGAAGA
TCGCAGGGAAATCATCTGATTAGATCGGAA
TCGCAGGGAAATCATCTGATTAAGATCGGAAGAA
TTCGCAGGGAAATCATCTGATTAGAA
or can be seen here: https://postimg.cc/image/knuvtyv0d/
This trimming was done using bbduk.sh (bbmap tools: https://jgi.doe.gov/data-and-tools/bbtools/bb-tools-user-guide/bbduk-guide/). Here NEB-SE.fa file has the adapter sequence mentioned above.
bbduk.sh -Xmx1g in=small_RNA_001.fastq out=bbduk_trimmed_small_RNA_001.fastq ref=NEB-SE.fa ktrim=r k=13 mink=6 minlength=18 hdist=0
I could reduce the kmer size k=4
, but that would risk into trimming the false positives. How can I completely get rid of these adapter sequences from my data without trimming false positives?
Yes, that's the full original sequence in the example. I had separated my data into aligned fastq and unaligned fastq files and randomly selected that read from aligned fasq file to look into unaligned fastq dataset. Turns the unaligned dataset also has this read but with adapter overhangs.
I did align the dataset using bowtie with
-v 1
option which allows for one mismatch. Is that what you were referring to when you said "Your aligner should deal with any extraneous bases while it does the alignment."?It would appear that
bowtie
is not able to soft-clip the adapter sequences.Since you have BBTools installed can you align your data with
bbmap.sh
usingambig=all vslow perfectmode maxsites=1000
(@Brian recommended these parameters with this note: It should be very fast in that mode (despite the vslow flag). Vslow mainly removes masking of low-complexity repetitive kmers, which is not usually a problem but can be with extremely short sequences like microRNAs)