I ran a nanopore sequencing on mk1c device with live basecalling and obtained some fastq in fastq_pass, fastq_fail folders. I tried to rerun the basecalling in a different machine, but found that they produce different sequences in just a test case, e.g.:
In the fastq basecalled in mk1c
> @b2e79451-050f-4d74-b091-40e6b6ee2229 runid=43a904bc3c628e3d9e32355643c0236ba632c012 read=110919 ch=21 start_time=2023-0
4-13T14:22:57.269028+00:00 flow_cell_id=FAR89176 protocol_group_id=STARRS sample_id=no_sample barcode=barcode01 barcode_
alias=barcode01 parent_read_id=b2e79451-050f-4d74-b091-40e6b6ee2229 basecall_model_version_id=2021-05-17_dna_r9.4.1_mini
And when I rebasecalled it in my workstation:
> @b2e79451-050f-4d74-b091-40e6b6ee2229 runid=43a904bc3c628e3d9e32355643c0236ba632c012 sampleid=no_sample read=110919 ch=21 start_time=2023-04-13T04:52:57Z model_version_id=2021-05-17_dna_r9.4.1_minion_384_d37a2ab9
And while the specific fast5 that I tested was from the fast5_pass folder, and I can find the read and run_id in the fastq_pass file, the sequence in the example was put into fastq_fail folder. There are also some examples of sequences where even the length differ for a little bit, so my question is:
am I doing something wrong? I have:
- set the model used for basecalling to be the same one (--flowcell "FLO-MIN106" --kit "SQK-RBK110-96" with high accuracy)
seq the min_qscore to be the same
The only difference I can find is the version of guppy basecaller :
- in mk1c: Version 6.4.6+ae70e8fa0, minimap2 version 2.24-r1122
- in workstation: Version 6.4.8+31becc9, minimap2 version 2.24-r1122
2: Given the situation, should I just (1) rebasecall everything with the newer version of guppy or (2) basecall my fast5_skip and use them in combination with the existing basecalled data?