I have a fastq file where each read has a barcode, each barcode corresponds to a individual. barcode resides in the middle of the read. The fastq file has 6 barcodes. each barcode is surrounded by a primer. I want a way to quantify(count) the number of times each barcode appears in the fastq file also taking sequencing error of reads into account.
I wrote a program to do that but it performs extremely poorly when there are 10k barcodes in a big fastq. The fastq data is from ONT machine.
I'm looking for any available tool to do my task, I already tried searching on google but I get tools on barcode demultiplexing.
Any help will be appreciated.
If you open-source and share your code on Github/GitLab, any one can help you to optimize it
I will surely do it, but I need a quick fix now.
Since this appears to be LAMPseq data there is some software made available here. It is not for long reads but may be usable.
I am aware of that software. But I felt there is a easier way to do this. BBDuk is a easy way and it can be optimised to get conservative estimation of barcodes. Thanks for the answer