Question

how to know what adapter sequences to trim for RNA-seq?

0

Entering edit mode

20 months ago

lunarskye222 • 0

Hello,

I'm very new to RNA-seq analysis and am currently stuck on the trimming step. The libraries I am trying to analyze were built using the NEXTFLEX Rapid Directional RNA-seq kit with their Unique Dual Index Barcodes (https://perkinelmer-appliedgenomics.com/wp-content/uploads/2022/02/NOVA-51292X-NEXTFLEX-RNA-Seq-2-0-UDI-Barcodes-V22-02-new.pdf). They were submitted 2x111 paired-end sequencing.

I'm trying to use cutadapt to do some pair-end trimming but am struggling to understand what exactly I need to trim here. Each barcode has a unique 8 bp index that corresponds to the P5/P7 regions -- is this what I am supposed to trim off? So in cutadapt: -a XXXXXXXX(P5 index) -A XXXXXXXX (P7index)

Or do I have to trim off the entire udi barcode (which seems rather long since my sequences are only 111bp) like this?: -a AATGATACGGCGACCACCGAGATCTACACXXXXXXXXACACTCTTTCCCTACACGACGCTCTTCCGATCT -A GATCGGAAGAGCACACGTCTGAACTCCAGTCACXXXXXXXXATCTCGTATGCCGTCTTCTGCTTG

Oligonucleotide sequence

The other component that is confusing to me is that when I checked the quality of the FASTQ file on FASTQC, they always point out an overrepresented TruSeq adapter sequence, which is confusing to me since this adapter was not used during library prep

If anyone has experience trimming with these barcodes or have any insights, that would be awesome!

fastq RNA-seq cutadapt • 3.6k views

ADD COMMENT • link updated 9 months ago by Brian Bushnell 20k • written 20 months ago by lunarskye222 • 0

0

Entering edit mode

As this is illumina paired end data you can also use TrimGalore tool that will automatically detect and remove adapters.

ADD REPLY • link 9 months ago by bioinfo_ga ▴ 70

score 0 · Answer 1 · 2023-03-26

0

Entering edit mode

20 months ago

ATpoint 86k

If fastqc doesn't report anything as adapter contamination then there is none most likely and you don't need any thimming.

The AATGATACGGCG... primer is part of the TruSeq adapter. Illumina libraries, even if using custom kits still need certain sequences to work with the Illumina chemistry, so that is just TruSeq, regardless of the name.

If you need to trim anythign it is likely the standard TruSeq/Universal Adapter sequence from Illumina.

ADD COMMENT • link 20 months ago by ATpoint 86k

0

Entering edit mode

Do you happen to know if the middle part of this udi barcode (GATCGGAAGAGCACACGTCTGAACTCCAGTCACGTCTACATATCTCG) is also part of the TruSeq adapter? This is what is showing up as overrepresented when checking on FASTQC but is trimming this sequencing for read 1 and read 2 sufficient?

ADD REPLY • link 20 months ago by lunarskye222 • 0

score 0 · Answer 2 · 2023-03-26

0

Entering edit mode

20 months ago

Ming Tommy Tang ★ 4.5k

You may use fastp which detect adaptors by itself https://github.com/OpenGene/fastp#adapters

ADD COMMENT • link 20 months ago by Ming Tommy Tang ★ 4.5k

score 0 · Answer 3 · 2024-03-09

The exact adapter sequence should always be provided to the analyst. But if it is not, you can discern and trim them using BBTools:

(for interleaved reads)

bbmerge.sh in=reads.fq outa=adapters.fa

(for reads in two files, where the # symbol goes where the 1 or 2 is for the file names)

bbmerge.sh in=whatever_R#.fq outa=adapters.fa

Now you need to examine the adapters file to see if it looks legitimate. If it's mostly empty of full of N's, something went wrong and you just need to find out the adapter sequences which were used, and the people upstream of you can tell you that. This can happen if you have long inserts so that most reads don't overlap. Failure in this stage is, in fact, a sign of good library prep.

Then you do (for interleaved reads):

bbduk.sh in=reads.fq out=trimmed.fq ktrim=r k=23 mink=11 hdist=1 tbo tpe minlen=100 ref=adapters.fa ftm=5

For twin files you would do this, where the # symbol is at the position that designates the read number in the file:

bbduk.sh in=reads_R#.fq out=trimmed_R#.fq ktrim=r k=23 mink=11 hdist=1 tbo tpe minlen=100 ref=adapters.fa ftm=5

Illumina actually has a lot of different adapter sequences they use for different library types. It's always best to ask the people upstream, "Please tell me the adapter sequences you used, plus the barcodes". But in my experience there is a 0% chance of that happening. So he best bet is to use BBMerge to determine the adapter sequence. But if that does not work for some reason, you can do this:

bbduk.sh in=reads.fq out=trimmed.fq ktrim=r k=23 mink=11 hdist=1 tbo tpe minlen=100 ref=adapters ftm=5

The flag "ref=adapters" will, if there are no local files named "adapters", use the default Illumina adapters. That's typically a good choice if you are not using custom adapters.