Hello all,
I have a ~296GB .fast5
which is the result of a metagenomic effort. I don't have a precise understanding of the history of this file, but my guess it is the product of multiple FAST5 being concatenated perhaps incorrectly.
Others in the past have used .bam
files produced from this data and it contained millions of reads, as expected.
For my analysis, I cannot use those .bam
files as I would like to basecall the data differently. However, when I attempt to use the nanopore basecaller dorado
on the file, or when I view the file in an hd5
viewer, both softwares tell me there are only 4000 reads.
Any recommendations to access the other reads in these data?
Looks like the simple concatenation is causing programs to read only up to the end of first file perhaps? You could try to see if you can convert the fast5 file into POD5 format and then use that with
dorado
. POD5 files are insanely faster compared to fast5 so if this works you will have dual benefit of recovering the data and doing so much faster.Converting to POD5 doesn't help unfortunately, it only converts 4000 reads.
Looks like unless you have a way of doing some low level manipluation of the file (or access to original separate fast5 files) you may be stuck with not being able to access the remaining data. Don't know if you could simply split the file and try the pieces independently (will depend on fasta5 file format).
It feels like you should have a look at the documentation of pod5 tool. It is stated that:
The output has only 4000 reads. Not based off of the progress bar.
You can try parsing the file with
slow5tools
to convert into slow5/blow5 format to see if the file can be rescued. If so, you can then basecall using theirdorado
fork (https://github.com/hiruna72/slow5-dorado), or convert into pod5 withblue-crab
(https://github.com/Psy-Fer/blue-crab) then use 'regular'dorado
.slow5tools
also only converts 4000 reads much like thepod5
converter.Tricky, but fast5 files are hdf5 container files. You might be able to retrieve your data via a manual script.
There is some information here on it : https://labs.epi2me.io/notebooks/Introduction_to_Fast5_files.html
Hi, I met the same problem, have you found the solution?