Question

Adapter trimming of small RNAseq data

2

Entering edit mode

6.5 years ago

MAPK ★ 2.1k

Hi All, I have a question regarding adapter trimming process of small RNA-seq data. The library for this dataset was prepared using NEBNext multiplex small RNA sample prep set for illumina (E7300S/L: https://www.neb.com/-/media/catalog/datacards-or-manuals/manuale7300.pdf). So I used bbduk.sh from BBtools(https://jgi.doe.gov/data-and-tools/bbtools/bb-tools-user-guide/bbduk-guide/) using the following command:

bbduk.sh -Xmx1g in=Ago2_SsHV2L_1_CATGGC_L003_R1_001.fastq out=/media/owner/7ef86942-96a5-48a7-a325-6c5e1aec7408/trimmed_files/bbmap_trimmed/clean_Ago2_SsHV2L_1_CATGGC_L003_R1_001.fastq  ref=NEB-SE_5_and_3_Prime.fa ktrim=r k=23 mink=11 hdist=1 tpe tbo

The adapter fileNEB-SE_5_and_3_Prime.fa contains both 5' and 3' adapters:

>NEB_sRNA_read_1
AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC
>NEB_sRNA_read_2
AGATCGGAA

So the problem I have is with the trimmed file- the trimmed file now got rid of first adapter:

cat clean_Ago2_SsHV2L_1_CATGGC_L003_R1_001.fastq | head -n 20000 | grep AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC
owner@owner-HP-Z840-Workstation[bbmap_trimmed]

but it is still showing the second adapter:

owner@owner-HP-Z840-Workstation[bbmap_trimmed] cat clean_Ago2_SsHV2L_1_CATGGC_L003_R1_001.fastq | head -n 1000 | grep AGATCGGAA
TTTCTCTGAGCACTCCTTAGTACAAGATCGGAAGAGCACACGTCGAACTC
AAATGTTCTGAGGACTGGTTCTAGATCGGAAGAGCACCGTCTGAACTCCA
GATGGGCCCCGGGTTCGATTCCCGGCGAACGCACCAGATCGGAAGAGCCA
TTGGACGTGTTATTTTCAGACAAGATCGGAAGAAGCACACGTCTGAACTC

Can someone please help me understand if I need to remove both of these adapters in order to perform downstream/expression analysis? I have been using btrim to trim adapters from RNAseq data (in this case I never had to provide adapter infile), but this is the first time I am doing it with bbmap (and also with trimmomatic) for smallRNAseq data. In case of smallRNAseq data, do we normally trim both 5' and 3' adapters and have both adapter sequences in infile fortrimming? Can someone please help me understand this process? Thank you for your help in advance.

smallRNAseq RNA-Seq adapter trimming • 6.8k views

ADD COMMENT • link updated 4.0 years ago by Biostar 20 • written 6.5 years ago by MAPK ★ 2.1k

1

Entering edit mode

It is possible that you may need to do two rounds of trimming. Are there specific directions for this kit as far as the bioinformatic analysis is concerned? I have used a BioO kit which required removal of the adapters followed by a hard trim of a certain number of base pairs from one of the ends leaving ~22-25 bp final miRNA read.

If there are no specific data handling directions, it would help if you draw the structure of the fragment that is generated after the adapters are ligated. It would orient you as to what to expect in the actual sequence and the steps (more than one may be needed) to remove the adapters/extraneous sequence.

ADD REPLY • link 6.5 years ago by GenoMax 148k

1

Entering edit mode

Other thing to try is to reduce the value of k to a smaller number k=5 and re-do the trimming (remove mink). It should get the remaining pieces at the ends of the reads. This will require more RAM so allocate 10g to be safe.

ADD REPLY • link 6.5 years ago by GenoMax 148k

0

Entering edit mode

Thank you so much for your help. I changed the k parameter k=9 and that removed both adapters.

ADD REPLY • link 6.5 years ago by MAPK ★ 2.1k

0

Entering edit mode

I think this thread here why remove adapter just from 3' of the reads and your answer somewhat answers why trimming 3' adaptor only should be fine.

ADD REPLY • link 6.5 years ago by MAPK ★ 2.1k

score 5 · Accepted Answer · 2018-07-19

5

Entering edit mode

6.5 years ago

Brad Langhorst ▴ 120

Hi:

I'm one of the developers of the NEBNext kits. The reads you show seem to contain the sequence of the 3' adapter as expected for small RNA. The 5' adapter sequence begins with a G (no A-tailing for this library type). The 5' adapter sequence should not be found in read 1, and I don't see it in the sequences you posted.

For our DNA Ultra, RNA Ultra and Small RNA methods, read 1 should always be trimmed with

AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC

(note: this is not always true for other vendor's methods)

For the Small RNA kit, Read 2 (if present, not typically done with short inserts) should be trimmed with

GATCGTCGGACTGTAGAACTCTGAACGTGTAGATCTCGGTGGTCGCCGTATCATT

The sequences of oligos used in our kits are documented at the end of our manuals.

I recommend using a simple program like flexbar [1] for single end trimming.

For paired end reads, presence of true adapter sequence requires that the insert is shorter than read length. In that scenario, both read 1 and read 2 contain information about the position of the adapter. Use of an adapter trimmer that takes advantage of this additional signal is advisable. E.g. seqprep[2], flexbar (-ap option)[1], etc.

[1] Roehr, J. T., Dieterich, C., & Reinert, K. (2017). Flexbar 3.0 - SIMD and multicore parallelization. Bioinformatics, 33(18), 2941–2942. http://doi.org/10.1093/bioinformatics/btx330 https://github.com/seqan/flexbar

[2] https://github.com/jstjohn/SeqPrep

ADD COMMENT • link 6.5 years ago by Brad Langhorst ▴ 120

0

Entering edit mode

Hi Brad, Thanks for replying to my question. So just to be clear: I am trimming 5' or the read1 (NEB-SE.fa with AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC) using trimmomatic as below:

java -jar trimmomatic-0.36.jar SE -phred33 Ago2.fq ILLUMINACLIP:NEB-SE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:18

Should I repeat the trimming for 3' adapter as well (with adapter: GATCGTCGGACTGTAGAACTCTGAACGTGTAGATCTCGGTGGTCGCCGTATCATT)? What I have done is I have used trimmomatic to trim read1 and then used shortstack (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3683909/) using all trimmed samples merged together to get the clusters of aligned reads with the genome while also trimming the 9 bases long adapter (AGATCGGAA) using inbuilt function of shortstack as below:

perl ShortStack --readfile merged.fastq --genomefile chromosomes.fasta --bowtie_cores 20 --adapter AGATCGGAA

So are you saying instead of AGATCGGAA I should be trimming the GATCGTCGGACTGTAGAACTCTGAACGTGTAGATCTCGGTGGTCGCCGTATCATT sequence? My data is single end and I still don't understand the relevance of trimming the 3' adapter. I would really appreciate if you could clarify more on this.

ADD REPLY • link 6.5 years ago by MAPK ★ 2.1k

1

Entering edit mode

Read 2 (if present, not typically done with short inserts)

If you don't have read 2 then you don't need to do the second trim step.

ADD REPLY • link 6.5 years ago by GenoMax 148k

1

Entering edit mode

The primer used by the instrument for sequencing sits down immediately before the insert, so there is no need to trim anything from that end in this scenario. You would only ever trim the GATCGTCGG... sequence if you performed paired end sequencing. Paired end sequencing of small insert libraries is only typically done when people want to sequence a small RNA library in a pool with longer insert libraries or when they want the absolute highest quality data. It actually does not cost much more (just instrument time) since you can reliably get 2 x 32 base reads out of a 50 cycle sequencing kit, but the quality improvement is not dramatic. In general, I think it's best to use the entire adapter/primer sequence up to the index portion for maximum sensitivity and minimum overtrimming.

ADD REPLY • link 6.5 years ago by Brad Langhorst ▴ 120

1

Entering edit mode

I have not used trimmomatic, but it's likely just fine. I typically avoid quality trimming when reads will be used in an alignment downstream, especially when reads are short every called base has some value.

ADD REPLY • link 6.5 years ago by Brad Langhorst ▴ 120

0

Entering edit mode

Thank you so much. Since my data is single end, I only need to trim 3' adaptor and not worry about 5' ?

ADD REPLY • link 6.5 years ago by MAPK ★ 2.1k

score 2 · Accepted Answer · 2018-07-18

2

Entering edit mode

6.5 years ago

h.mon 35k

The smallest adapter sequence (NEB_sRNA_read_2P) is just 9bp, I think with your current settings of k=23 mink=11 it is not being used. Try using k=9 mink=6 hdist=0.

The flags tbo and tpe have no effect here, as you have single end data.

ADD COMMENT • link 6.5 years ago by h.mon 35k

0

Entering edit mode

Thank you so much for your help.Changing k=9 mink=6 hdist=0 did work for me.

ADD REPLY • link 6.5 years ago by MAPK ★ 2.1k