Hi
I have amplicon sequence data where I included a UMI (unique molecular identifier) on my reads to allow me to correct sequencing errors. I have removed the UMIs from the reads and added them to a tag in the fastq files. I have then aligned the reads to my reference and would now like to make consensus reads for those with the same UMI, i.e., that arose from the same DNA molecule. The sequence data is very noisy and there are many indels in the reads.
I have tried using fgbio but this cannot handle indels. I have also tried gencore, which is for pair-end read data but should work using the UMIs for single reads, however, it did nothing to the data, even when running on the least stringent setting possible. Does anyone know of a tool that can do what I need?
Thanks, but Cablib can only deal with pair-end reads (I wasn't clear in my initial question) so I cant use it without a lot of customization (or maybe I can filter by length and then just split the fatqs?). I hope you get the funding to implement it in UMI-tools; I'm sure it would be used a lot with the increase in nanopore assays!