Dear All,
We have been dealing with Nanopore Rapid Barcoding technology (SQK-RBK.110.96) for SARS-CoV-2 full genome sequencing. When playing around with the data, we realised a couple of things. Firstly a lot of our reads are going in to the "unclassified" folder. When we looked at our "fast5_pass" files, we see that ~60% of reads out of 4000 reads contain the specific barcode perfectly. When you change the specificity of binding (introduce 1,2 or 3 mutations) another 20% of the reads can be explained. However, in the "unclassified" folder one would expect to see, no perfect matches of any barcodes, as I know one criteria for a read to be classified as "unclassified" the barcode score must be below that threshold (in our case, the default which is 60). To re-iterate, in that case one would see almost no "perfect" matches to barcodes, but we do. Then the question becomes: 1) How is this 60 barcode score calculated, how does Guppy assign a barcode score to each read, is it like Phred in Illumina, can anyone explain? 2) How actually is a read assigned to "unclassified" especially when it has 1 barcode attahced to the read. Is there an algorithm that scores these barcodes differently?
Thank you for your responses,
Best,
Ege
I have the same question. Have you figured out this problem?
Hi henry,
I have no idea how barcode scores are calculated and you should try to ask this question in the ONT community forum. It is not based on Phred score like Illumina but rather the base-calling models are trained to dectect barcode signal data in the fast5 files.
Other than this, I had a similar problem with many reads from an ONT run classified as "unclassified". I managed to solve this by upgrading to the latest version of guppy -> guppy v6.1.1.
I hope this will help