For 454 runs, our current maximum is 4.3 G. However, Ion Torrent derived sff file may be much bigger (our largest Iontorrent sff file is 23 G file after compression with bzip2...)
I also have seen only small files (way below 4GB).
The thing is that IndexOffset field is defined in the SFF documentation as 8 bytes. This means that a file could MUCH bigger than 4GB. But I guess it is a "just in case" precaution: they made that field 8 bytes so they can expand in the future without updating the file format definition.
The second reason that makes me believe that SFF files were designed to be small is that SFF already has support for index (which could be optional, is true). For LARGE files, the index itself will take a lot of RAM, maybe more than available, so it would be pointless to store an index if you cannot load it.
-----------
My question is: should I bother loading the index (since it is already built into the file) or totally ignore it to keep memory footprint small? If the SFF files were designed to be 'small' (under 4GB) it would make sense to use the already built in index (when available, of course).
Well the 454 platform has been retired so you should account for that. Also as I mentioned before we almost never need to access unaligned reads in a random fashion so any resources you devote to this are unnecessary.
The SFF file also contains other information on the sequencing process (flowgrams), that may be useful to incorporate.
So, 4.3GB is the size of the bzip file? Which means the SFF is ~ double as size?