Hi All,
I completed a single cell DNA barcoding experiment and have the .fastq file. The reads in the .fastq file are the 40 bp cell barcodes. Is there a way to count the frequency of the barcodes in the .fastq file de novo? I would like to determine how many different 40 bp barcodes are present in the population, and then count them.
Maybe this needs to be taken in 2 steps. The first is to know the bar code sequences that are present? They to use those sequences in a counting step?
Is there any advice on how to do this?
Best,
Joe
Try UMI-tools: Or UMI-tools https://umi-tools.readthedocs.io/en/latest/reference/whitelist.html
The difficulty is to decide which detected barcodes are real and which are just noise. Read through the docs, it explains this.