Demultiplexing and deduplicating using barcodes and UMIs in the "mate" read
1
1
Entering edit mode
5.5 years ago

Hi All,

I'd like to demultiplex and deduplicate reads using in-line barcodes from read2 and UMIs from the read1 and read2. The format is like this (read2 is a short read, containing only barcode and UMI).

1: UMI-primer-READ 2: barcode-UMI

UMI-tools seems to be suitable, but I failed to find how to sort based on UMI combinations. Maybe you know of a tool that can handle both UMIs and barcodes, and can sort reads based on combinations of UMIs from both PE reads?

Any help with where to start with that highly appreciated...

Cheers, Lech

RNA-Seq next-gen demultiplexing • 5.8k views
ADD COMMENT
0
Entering edit mode

Could you post a few lines as example? Do you have an index list?

ADD REPLY
0
Entering edit mode

Barcodes embedded in the gene specific RT primers containing partial illumina adapters, below is the example (UMI and barcodes in bold). I don't have the reads yet as I'd like to have the strategy thought-through before starting. The experiment is aimed at assessing T->C conversions (SLAMseq) within the amplicon.

Reverse (RT) primers:

5'GTTCAGACGTGTGCTCTTCCGATCTNNNNNTTTCTCCTGCTTGCTGATCCACATCTGCTG 5'GTTCAGACGTGTGCTCTTCCGATCTNNNNNATTCTCCTGCTTGCTGATCCACATCTGCTG 5'GTTCAGACGTGTGCTCTTCCGATCTNNNNNAGTCTCCTGCTTGCTGATCCACATCTGCTG

Forward primer:

5'CACGACGCTCTTCCGATCTNNNNNNGACGTGGACATCCGCAAAGACC

ADD REPLY
0
Entering edit mode

But that is assuming the adapters were sequenced. Were there sequencing cycles to sequence the indices?

ADD REPLY
0
Entering edit mode

My plan is to sequence only UMI and index using read2 (Reverse), so 8-9 cycles. Is there a problem with that? If this is for some reason (cost) inefficient approach, please let me know.

ADD REPLY
0
Entering edit mode

I don't get it. You will simply prime next to read2 and hope to run into the barcode+UMI?

ADD REPLY
0
Entering edit mode

Sorry, maybe I was not clear (or it is me, who simply don't get it):

My barcode+UMI will be located just upstream of the following primer:

Multiplexing Read 2 Sequencing Primer 5' GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT

In such a case,. I get the barcode+UMI sequence in the first 8 cycles. Isn't that right?

ADD REPLY
0
Entering edit mode

if you have dedicate cycles, it's ok, I was wondering about the output format.

ADD REPLY
1
Entering edit mode
5.5 years ago

Unfortunately demultiplexing is not something that UMI-tools does. Are you sure you need to demultiplex? For applications where you just need to quantify the number of reads per gene per cell (like most single-cell RNA-seq experiments), you can go directly to the per-cell quantification without first demultiplexing. (you would do this with umi_tools whitelist -> umi_tools extract -> read-mapping -> umi_tools count.

If you do want a per barcode BAM file then you need to process the files in two steps. First remove the UMI using umi_tools extract in paried-end mode, passing read2 to its standard in and read1 to its --read2-in. Then demultiplex using a dedicated demultiplexer. I use reaper from the Enight lab. Then align the read1s, and finally run umi_tools dedup. This, funnily enough, is the workflow that umi_tools was first designed to work for.

ADD COMMENT
0
Entering edit mode

Thanks a lot, this is very informative. As this is not scRNAseq, and the barcodes represent individual samples, separate BAM per barcode would be ideal. I was also thinking about using FastX barcode splitter, after combining the pairs into one read after trimming, etc. Will try your solution!

ADD REPLY

Login before adding your answer.

Traffic: 1395 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6