Question

Cannot process all the reads in a fast5 file?

0

Entering edit mode

20 months ago

Gio • 0

Hello all,

I have a ~296GB .fast5 which is the result of a metagenomic effort. I don't have a precise understanding of the history of this file, but my guess it is the product of multiple FAST5 being concatenated perhaps incorrectly.

Others in the past have used .bam files produced from this data and it contained millions of reads, as expected.

For my analysis, I cannot use those .bam files as I would like to basecall the data differently. However, when I attempt to use the nanopore basecaller dorado on the file, or when I view the file in an hd5 viewer, both softwares tell me there are only 4000 reads.

Any recommendations to access the other reads in these data?

metagenome base-calling fastq nanopore • 3.7k views

ADD COMMENT • link updated 12 months ago by Ram 45k • written 20 months ago by Gio • 0

0

Entering edit mode

Looks like the simple concatenation is causing programs to read only up to the end of first file perhaps? You could try to see if you can convert the fast5 file into POD5 format and then use that with dorado. POD5 files are insanely faster compared to fast5 so if this works you will have dual benefit of recovering the data and doing so much faster.

ADD REPLY • link 20 months ago by GenoMax 151k

0

Entering edit mode

Converting to POD5 doesn't help unfortunately, it only converts 4000 reads.

ADD REPLY • link 20 months ago by Gio • 0

0

Entering edit mode

Looks like unless you have a way of doing some low level manipluation of the file (or access to original separate fast5 files) you may be stuck with not being able to access the remaining data. Don't know if you could simply split the file and try the pieces independently (will depend on fasta5 file format).

ADD REPLY • link 20 months ago by GenoMax 151k

0

Entering edit mode

It feels like you should have a look at the documentation of pod5 tool. It is stated that:

The progress bar shown during conversion assumes the number of reads in an input .fast5 is 4000. The progress bar will update the total value during runtime if required.

ADD REPLY • link 19 months ago by bioAddict • 0

0

Entering edit mode

The output has only 4000 reads. Not based off of the progress bar.

ADD REPLY • link 19 months ago by Gio • 0

0

Entering edit mode

You can try parsing the file with slow5tools to convert into slow5/blow5 format to see if the file can be rescued. If so, you can then basecall using their dorado fork (https://github.com/hiruna72/slow5-dorado), or convert into pod5 with blue-crab (https://github.com/Psy-Fer/blue-crab) then use 'regular' dorado.

ADD REPLY • link 20 months ago by cfos4698 ★ 1.1k

0

Entering edit mode

slow5tools also only converts 4000 reads much like the pod5 converter.

ADD REPLY • link 20 months ago by Gio • 0

0

Entering edit mode

Tricky, but fast5 files are hdf5 container files. You might be able to retrieve your data via a manual script.

There is some information here on it : https://labs.epi2me.io/notebooks/Introduction_to_Fast5_files.html