Question

Why does Guppy output different sequence data (same model)?

0

Entering edit mode

19 months ago

Ludwig Kian Soon • 0

I ran a nanopore sequencing on mk1c device with live basecalling and obtained some fastq in fastq_pass, fastq_fail folders. I tried to rerun the basecalling in a different machine, but found that they produce different sequences in just a test case, e.g.:

In the fastq basecalled in mk1c

> @b2e79451-050f-4d74-b091-40e6b6ee2229 runid=43a904bc3c628e3d9e32355643c0236ba632c012 read=110919 ch=21 start_time=2023-0
4-13T14:22:57.269028+00:00 flow_cell_id=FAR89176 protocol_group_id=STARRS sample_id=no_sample barcode=barcode01 barcode_
alias=barcode01 parent_read_id=b2e79451-050f-4d74-b091-40e6b6ee2229 basecall_model_version_id=2021-05-17_dna_r9.4.1_mini
on_384_d37a2ab9
ATTTATCCTTGTACTTCCAGTTGCAGTAGGTGTTTAACCAGAAAGTTGTAAGTGTCGCTGTGGTTTTCGCATTTATCGTGAAAACGCTTTCGCGTTTTTCGTGCGCCGCTTCAGTATTTG
AAATCTTTATATCTTGATTAATTTCATTTCCGTTTGAAATTGCTGATTTGTTGTCTAACTTTAAACTTGTGTCCGATGTTTTTTAACAGCACCTTCATTTTTATTTTGTCTTTTGTCGTA
TTTTTATTAGCATTTAA

And when I rebasecalled it in my workstation:

> @b2e79451-050f-4d74-b091-40e6b6ee2229 runid=43a904bc3c628e3d9e32355643c0236ba632c012 sampleid=no_sample read=110919 ch=21 start_time=2023-04-13T04:52:57Z model_version_id=2021-05-17_dna_r9.4.1_minion_384_d37a2ab9
ATTTATCCTTGTACTTCCAGTTGCAGGTAGGTGTTTAACCAGAAAGTTGTAAGTGTCGCTGTGGTTTTCGCATTTATCGTGAAAACGCTTTCGCGTTTTTCGTGCGCCGCTTCAGTATTTGAAATCTTTATATCTTGATTAATTTCATTTCCGTTTGAAATTGCTGATTTGTTGTCTAACTTTAAACTTGTGTCCGATGTTTTTTAACAGCACCTTCATTTTTATTTTGTCTTTTGTCGTATTTTTATTAGCATTTAA

And while the specific fast5 that I tested was from the fast5_pass folder, and I can find the read and run_id in the fastq_pass file, the sequence in the example was put into fastq_fail folder. There are also some examples of sequences where even the length differ for a little bit, so my question is:

am I doing something wrong? I have:
- set the model used for basecalling to be the same one (--flowcell "FLO-MIN106" --kit "SQK-RBK110-96" with high accuracy)
- seq the min_qscore to be the same
  
  The only difference I can find is the version of guppy basecaller :
  - in mk1c: Version 6.4.6+ae70e8fa0, minimap2 version 2.24-r1122
  - in workstation: Version 6.4.8+31becc9, minimap2 version 2.24-r1122

2: Given the situation, should I just (1) rebasecall everything with the newer version of guppy or (2) basecall my fast5_skip and use them in combination with the existing basecalled data?

guppy Nanopore • 1.1k views

ADD COMMENT • link updated 19 months ago by GenoMax 147k • written 19 months ago by Ludwig Kian Soon • 0

score 0 · Answer 1 · 2023-04-24

If you have a GPU, just re-basecall everything with SUP accuracy with the most recent Guppy.

It is impossible to say from your evidence of just one read what is going on here. You want to think about read distributions, Q values, mapped accuracy(cramino is good for this), file sizes, not single reads, when comparing nanopore basecalling or runs.