Question

Demultiplexing and deduplicating using barcodes and UMIs in the "mate" read

1

Entering edit mode

5.5 years ago

lech.kaczmarczyk ▴ 50

Hi All,

I'd like to demultiplex and deduplicate reads using in-line barcodes from read2 and UMIs from the read1 and read2. The format is like this (read2 is a short read, containing only barcode and UMI).

1: UMI-primer-READ 2: barcode-UMI

UMI-tools seems to be suitable, but I failed to find how to sort based on UMI combinations. Maybe you know of a tool that can handle both UMIs and barcodes, and can sort reads based on combinations of UMIs from both PE reads?

Any help with where to start with that highly appreciated...

Cheers, Lech

RNA-Seq next-gen demultiplexing • 5.8k views

ADD COMMENT • link updated 6 months ago by i.sudbery 20k • written 5.5 years ago by lech.kaczmarczyk ▴ 50

0

Entering edit mode

Could you post a few lines as example? Do you have an index list?

ADD REPLY • link 5.5 years ago by Gabriel R. ★ 2.9k

0

Entering edit mode

Barcodes embedded in the gene specific RT primers containing partial illumina adapters, below is the example (UMI and barcodes in bold). I don't have the reads yet as I'd like to have the strategy thought-through before starting. The experiment is aimed at assessing T->C conversions (SLAMseq) within the amplicon.

Reverse (RT) primers:

5'GTTCAGACGTGTGCTCTTCCGATCTNNNNNTTTCTCCTGCTTGCTGATCCACATCTGCTG 5'GTTCAGACGTGTGCTCTTCCGATCTNNNNNATTCTCCTGCTTGCTGATCCACATCTGCTG 5'GTTCAGACGTGTGCTCTTCCGATCTNNNNNAGTCTCCTGCTTGCTGATCCACATCTGCTG

Forward primer:

5'CACGACGCTCTTCCGATCTNNNNNNGACGTGGACATCCGCAAAGACC

ADD REPLY • link 5.5 years ago by lech.kaczmarczyk ▴ 50

0

Entering edit mode

But that is assuming the adapters were sequenced. Were there sequencing cycles to sequence the indices?

ADD REPLY • link 5.5 years ago by Gabriel R. ★ 2.9k

0

Entering edit mode

My plan is to sequence only UMI and index using read2 (Reverse), so 8-9 cycles. Is there a problem with that? If this is for some reason (cost) inefficient approach, please let me know.

ADD REPLY • link 5.5 years ago by lech.kaczmarczyk ▴ 50

0

Entering edit mode

I don't get it. You will simply prime next to read2 and hope to run into the barcode+UMI?

ADD REPLY • link 5.5 years ago by Gabriel R. ★ 2.9k

0

Entering edit mode

Sorry, maybe I was not clear (or it is me, who simply don't get it):

My barcode+UMI will be located just upstream of the following primer:

Multiplexing Read 2 Sequencing Primer 5' GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT

In such a case,. I get the barcode+UMI sequence in the first 8 cycles. Isn't that right?

ADD REPLY • link 5.5 years ago by lech.kaczmarczyk ▴ 50

0

Entering edit mode

if you have dedicate cycles, it's ok, I was wondering about the output format.

ADD REPLY • link 5.5 years ago by Gabriel R. ★ 2.9k

score 1 · Answer 1 · 2019-05-30

Unfortunately demultiplexing is not something that UMI-tools does. Are you sure you need to demultiplex? For applications where you just need to quantify the number of reads per gene per cell (like most single-cell RNA-seq experiments), you can go directly to the per-cell quantification without first demultiplexing. (you would do this with umi_tools whitelist -> umi_tools extract -> read-mapping -> umi_tools count.

If you do want a per barcode BAM file then you need to process the files in two steps. First remove the UMI using umi_tools extract in paried-end mode, passing read2 to its standard in and read1 to its --read2-in. Then demultiplex using a dedicated demultiplexer. I use reaper from the Enight lab. Then align the read1s, and finally run umi_tools dedup. This, funnily enough, is the workflow that umi_tools was first designed to work for.