Question

Using UMI-tools on Smart-seq3 RNA-Seq data

0

Entering edit mode

4 months ago

Agastya ▴ 10

I am trying to analyze some single embryo RNA-seq data. We use the Smart-seq3 protocol at our NGS facility and I have received the index demultiplexed files from our facility. I would like to deduplicate reads using the UMI sequences. From what I understand this is what a read from the 5' end of the transcripts look like: Smart-seq 5' fragment

If I understand correctly (from this), read 1 primer sits on the bottom strand at the s5-ME sequence and extends. So read 1's 5' end contains the 5'fragment tag - UMI - cDNA - ME - s7 sequences (in that order). I can specify --bc-pattern=CCCCCCCCCCCNNNNNNNN to extract the fragment tag (11bp) as the cell barcode and the 8bp UMI separately from the 5' of read 1 for next step of deduplication. Now, my confusion is with read 2. Read 2 primer sits on the top strand at s7-ME sequences and extends till ME-s5. Therefore, it looks like cDNA - UMI - 5'fragment tag - ME - s5.

My question is will UMI-tools (paired-end mode) be able to remove the detect and extract UMI sequences from the 3' end of read 2? Or do I need to specify --bc-pattern2 using regex and specify that read 2 has 3' end UMIs? If so, could what would be the regex pattern (I am still learning regex and not great at it).

For completeness of info, I have already trimmed my demultiplexed reads (using trimgalore default) so there are no adapter sequences on 3' ends of both reads. So, read 1 looks like : 5'fragment tag - UMI - cDNA and read 2 : cDNA - UMI - 5'fragment tag.

rna-seq UMI UMI-tools smart-seq3 • 699 views

ADD COMMENT • link updated 4 months ago by Ram 45k • written 4 months ago by Agastya ▴ 10

score 0 · Answer 1 · 2025-03-21

0

Entering edit mode

4 months ago

i.sudbery 22k

In paired mode, with --bc-pattern=CCCCCCCCCCCNNNNNNNN UMI-tools will extract the 5' fragment tag as the cell barcode and the next 8bp as the UMI, and add it to both read1 and read2. There is no need to extact anything from read2 in terms of adding its UMI. However, there is no real way, with UMI tools, to trim the UMI off the 3' end of read2 where the sequencing reads all the way through. In fact, you are likely to find that the UMI/5; tag is not present on the majority of read2s, because unless the cDNA insert is short, most read2s won't reach that far. It would be better to trimm it off using a adaptor trimmer, although I'm not sure I know if any of the adaptor trimmers take N bases?

ADD COMMENT • link 4 months ago by i.sudbery 22k

0

Entering edit mode

Okay, so from what I get, both the read 1 and read 2 of a pair get the UMI barcode extracted from read 1? I did a quick grep search and you're right, majority of read 1 contains the 5' fragment tag but little of read 2 contains the 5' fragment tag. I don't think I will trim read 2 because, well, I'm unsure if it will make a big difference to my alignment output. Thanks so much for the clarification!

ADD REPLY • link 4 months ago by Agastya ▴ 10