Hi All,
I'd like to demultiplex and deduplicate reads using in-line barcodes from read2 and UMIs from the read1 and read2. The format is like this (read2 is a short read, containing only barcode and UMI).
1: UMI-primer-READ 2: barcode-UMI
UMI-tools seems to be suitable, but I failed to find how to sort based on UMI combinations. Maybe you know of a tool that can handle both UMIs and barcodes, and can sort reads based on combinations of UMIs from both PE reads?
Any help with where to start with that highly appreciated...
Cheers, Lech
Could you post a few lines as example? Do you have an index list?
Barcodes embedded in the gene specific RT primers containing partial illumina adapters, below is the example (UMI and barcodes in bold). I don't have the reads yet as I'd like to have the strategy thought-through before starting. The experiment is aimed at assessing T->C conversions (SLAMseq) within the amplicon.
Reverse (RT) primers:
5'GTTCAGACGTGTGCTCTTCCGATCTNNNNNTTTCTCCTGCTTGCTGATCCACATCTGCTG 5'GTTCAGACGTGTGCTCTTCCGATCTNNNNNATTCTCCTGCTTGCTGATCCACATCTGCTG 5'GTTCAGACGTGTGCTCTTCCGATCTNNNNNAGTCTCCTGCTTGCTGATCCACATCTGCTG
Forward primer:
5'CACGACGCTCTTCCGATCTNNNNNNGACGTGGACATCCGCAAAGACC
But that is assuming the adapters were sequenced. Were there sequencing cycles to sequence the indices?
My plan is to sequence only UMI and index using read2 (Reverse), so 8-9 cycles. Is there a problem with that? If this is for some reason (cost) inefficient approach, please let me know.
I don't get it. You will simply prime next to read2 and hope to run into the barcode+UMI?
Sorry, maybe I was not clear (or it is me, who simply don't get it):
My barcode+UMI will be located just upstream of the following primer:
Multiplexing Read 2 Sequencing Primer 5' GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT
In such a case,. I get the barcode+UMI sequence in the first 8 cycles. Isn't that right?
if you have dedicate cycles, it's ok, I was wondering about the output format.