Hello,
I have sequencing data from a library prep that I've removed the adapter sequences from, leaving a set of ~20 nt barcodes generated from the reads (about 2e6). I have a file listing all of the possible barcodes (~5000, they are also ~20 nt) and would like to know the number of times each barcode appears among the reads.
I know this is probably a more powerful tool than I need, but I was thinking of using Bowtie 2 to bin the reads, using a reference genome where each barcode is its own chromosome. However, I'm not sure how to convert the table of all the barcodes into the correct format to feed into Bowtie 2.
Any help would be much appreciated. Thanks!
If your reads and barcodes are both 20 bp and you are looking for exact matches then simply using UNIX
grep
using full word match mode (or ideallyseqkit grep
(LINK) may be the easiest option.