Hello,
I performed MinION sequencing run using MinKNOW software,and 4 days after sequencing finish it had only 24% of bases called. So, I decided to do base calling on more powerful computer with using Guppy. But I am not sure, which data I should take as input? Input file format for Guppy is .fast5.
I suppose that .fast5 data are generated after base calling, and if I continue base calling via MinKNOW, it will work about 2 weeks till completion.
I also found "queued_reads" folder in my data, and it contains 512 folders (which I suppose come from 512 channels on MinION flowcell), and these folders contain files of .raw format. Is there any software to continue base calling based on these .raw files as input? Or the only way is to wait until 100% .fast5 files are generated?
Thank you, Anastasiia
Hi Wouter,
I did as you adviced and now I am on a base calling stage. So, my next question is quite specific: how to use the guppy tool with the best performance of the computer? I have linux system with 16 cores and 2 threads per core (and 125 Gb RAM). How to specify GPU base calling? During the run I put an optional command --cpu_threads_per_caller 16; but I came to work next day and found that only 10% of the process had been finished. Moreover, I found in Guppy documentation that "if GPU base calling is run, modification of number of CPU threads per caller is not effective. If so, is there any other possibility to increase the base calling speed?
Many thanks, Anastasiia
Do you have access to the nanopore community forum? More about guppy can be found there.
For running on GPU you need to set the
--device
parameter, but I'm not sure how you should do that correctly on your system. If I basecall stuff it is on the PromethION and there it is--device "cuda:0 cuda:1 cuda:2 cuda:3"
.The only thing I managed to find on Community is the documentation for Guppy, but I guess I should start from settings of my video card, to run the GPU base calling. Anyway, thanks a lot for answer!
Can you tell us about the GPU you have in your system?
If it doesn't have an adequate GPU and you are forced to do CPU basecalling then i would recommend using guppys "fast" config. CPU basecalling with the default config takes days to weeks on a single machine.
Yep, I do high accuracy mode calling on a slurm cluster since I don't have a GPU which will work with Guppy. It takes 6-7 times as long as the fast mode calling. We need a GPU and or Minit, if you massively split the input it is quite quick on a cluster too.
I have NVIDIA GeForce 1030 and I've already realized that my GPU is not adequate for Guppy base calling Could you tell me how to specify this "fast" config? Because I didn't manage to find this in Guppy docuentation. Moreover, I have part of files already basecalled through MinKNOW so I'm afraid that output data would be different from the rest of files base called in another manner.
Try :
I get a lot of different workflows. Eg.
In my SLURM script I have the following: