Question

How do you decide the minimum length for the adapter that needs trimming for small RNAseq

0

Entering edit mode

7.0 years ago

MAPK ★ 2.1k

Hi All, I am trying to trim this adapter I have (NEB-SE.fa= AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC) using trimmomatic using command below:

java -jar trimmomatic-0.36.jar SE -phred33 seqL_1_GACGAC_L003_R1_001.fastq trimmed_output.fastq ILLUMINACLIP:NEB-SE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:18

It trims the full lenght adapter:

cat trimmed_output.fastq | head -n 20000 | grep AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC

And doesn't trim sequences below lenght 18 (since I have used the parameter 'MINLEN:18'

cat trimmed_output.fastq | head -n 20000 | grep AGATCGGAAGAGCA

TCTGTAATACTCTAACTTGGAAGATCGGAAGAGCACGCGTCTGAACTCCA
AAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGAGATCGGAAGAGCA
TCTTTGATGGCCTAACGGTCATGATTTCCGCTTGTAGATCGGAAGAGCAC
ACATATTCTTTTTAAGAGTATAAAATTGACTGTGAGATCGGAAGAGCACA
CGCGGGAGACCGGGGTTCAATTCCCCGTACCGGAACAGATCGGAAGAGCA
TCCGAATTAGTGTAGGGGTTAACATAACTCGCTTTAGATCGGAAGAGCAC
TCTTTGATGGTCTAACGGTCATGATTTCCGCTTGAGATCGGAAGAGCAAC

I did the blast search on the genome and I can see AGATCGGAAGAG sequence present in the genome, but not sequence AGATCGGAAGAGCA (confirming this sequence to be part of the adapter). How do I accurately trim the adapter sequence sequence without trimming the genomic sequence OR in this case choose MINLEN accurately? Your help would be greatly appreciated.

trimmomatic trimming adapter smallRNAseq • 3.3k views

ADD COMMENT • link 7.0 years ago by MAPK ★ 2.1k

score 0 · Answer 1 · 2018-07-20

0

Entering edit mode

7.0 years ago

GenoMax 152k

I though this answered your question about adapters: A: Adapter trimming of small RNAseq data

You could use literal=AGATCGGAAGAGCA with bbduk.sh with a k=6 or so.

ADD COMMENT • link 7.0 years ago by GenoMax 152k

0

Entering edit mode

Sorry. My original question was a bit confusing, I have just edited my question. I want to understand how to accurately choose the minimum length of the adapter to be trimmed. Should I first do the blast search within the genome (as I have mentioned above) and determine the minimum legth for the adapter trimming?

ADD REPLY • link 7.0 years ago by MAPK ★ 2.1k

1

Entering edit mode

Since AGATCGGAAGAGCA is the core sequence you want to identify you can start with a k of about half of the length of that core sequence. You could reduce it further but then time/memory requirements would likely go up.

Trimmomatic manual describes this process on page 6-7.

ADD REPLY • link 7.0 years ago by GenoMax 152k

0

Entering edit mode

Thank you so much. So if AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC is my adapter sequence (length=34), are you saying that I can choose k=34/2 =17? Could you please clarify this?

ADD REPLY • link 7.0 years ago by MAPK ★ 2.1k

1

Entering edit mode

Imagine taking pieces of that string and trying to find an initial alignment. Smaller the piece easier you would be able to find the initial alignment. So for that adapter you can choose a string somewhere between a third and half size as a start.

ADD REPLY • link 7.0 years ago by GenoMax 152k

0

Entering edit mode

Thank you. I was able to work it out. I have also set minimum and maximum length built in bbduk.sh for the reads to be 18-30 bases so I only have small RNAs in my data.

ADD REPLY • link 7.0 years ago by MAPK ★ 2.1k