Hello, everyone
I have miRNA single-end reads from the library prepared by NEXTFLEX Small RNA-Seq Kit. The read structure after trimming Illumina adaptors is:
| UMI 1 (4 nt) | miRNA | UMI 2 (4 nt) | Adaptor | Remaining sequence |
I want to use umi_tools extract
to add UMI sequences to the read name. Can I achieve this by regex:
--bc-pattern=`(?P<umi_1>.{4}).+(?P<umi_2>.{4})(?P<discard_1>TGGAATTCTCGGGTGCCAAGG){s<=1}(?P<discard_2>.+)`
Thanks a lot in advance
Thank you very much! I wasn't aware that version 4 doesn't use randomized bases.
Isn't this a common issue with any ligation-based protocol? We currently use the QIAseq small RNA kit and have also observed ligation bias for certain miRNAs.
Yes, there's bias introduced in every small RNA-seq method, since miRs have their own secondary structures (which may block miR ends) and there's also miRNA-adapter cofolding to consider. I just meant that the randomized bases themselves can't be assumed to have a random distribution, since each miRNA has its own preference of "random" bases. Qiagen adds their UMI's during reverse transcription, so that UMI should have more of a random distribution.
To be fair, I've never seen a UMI dataset where the UMI usage looks random.