Hi All,
I am trying to trim this adapter I have (NEB-SE.fa= AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC
) using trimmomatic using command below:
java -jar trimmomatic-0.36.jar SE -phred33 seqL_1_GACGAC_L003_R1_001.fastq trimmed_output.fastq ILLUMINACLIP:NEB-SE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:18
It trims the full lenght adapter:
cat trimmed_output.fastq | head -n 20000 | grep AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC
And doesn't trim sequences below lenght 18 (since I have used the parameter 'MINLEN:18'
cat trimmed_output.fastq | head -n 20000 | grep AGATCGGAAGAGCA
TCTGTAATACTCTAACTTGGAAGATCGGAAGAGCACGCGTCTGAACTCCA
AAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGAGATCGGAAGAGCA
TCTTTGATGGCCTAACGGTCATGATTTCCGCTTGTAGATCGGAAGAGCAC
ACATATTCTTTTTAAGAGTATAAAATTGACTGTGAGATCGGAAGAGCACA
CGCGGGAGACCGGGGTTCAATTCCCCGTACCGGAACAGATCGGAAGAGCA
TCCGAATTAGTGTAGGGGTTAACATAACTCGCTTTAGATCGGAAGAGCAC
TCTTTGATGGTCTAACGGTCATGATTTCCGCTTGAGATCGGAAGAGCAAC
I did the blast search on the genome and I can see AGATCGGAAGAG sequence present in the genome, but not sequence AGATCGGAAGAGCA
(confirming this sequence to be part of the adapter). How do I accurately trim the adapter sequence sequence without trimming the genomic sequence OR in this case choose MINLEN accurately? Your help would be greatly appreciated.
Sorry. My original question was a bit confusing, I have just edited my question. I want to understand how to accurately choose the minimum length of the adapter to be trimmed. Should I first do the blast search within the genome (as I have mentioned above) and determine the minimum legth for the adapter trimming?
Since
AGATCGGAAGAGCA
is the core sequence you want to identify you can start with ak
of about half of the length of that core sequence. You could reduce it further but then time/memory requirements would likely go up.Trimmomatic manual describes this process on page 6-7.
Thank you so much. So if
AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC
is my adapter sequence (length=34), are you saying that I can choose k=34/2 =17? Could you please clarify this?Imagine taking pieces of that string and trying to find an initial alignment. Smaller the piece easier you would be able to find the initial alignment. So for that adapter you can choose a string somewhere between a third and half size as a start.
Thank you. I was able to work it out. I have also set minimum and maximum length built in
bbduk.sh
for the reads to be 18-30 bases so I only have small RNAs in my data.