let me first clarify that the Ion Torrent technology is not image based: it works by measuring pH variations that are translated into voltage differences. this is the main reason why this technique is more accurate and far more fast than SOLiD for instance. so in terms of disk size we definitely avoid one of the most hard to handle issues from previous technologies. still, data size has to be studied.
during this week we are having the Ion Torrent lab training, and we just had today an Ion Torrent introduction to the general public today in which data sizes were mentioned. in it, they stressed that the raw data that arrives to the torrent server is ~30GB for the 314 chip (10Mb), compared to the ~170GB of reads that a pair end exome experiment generates on the SOLiD, and that after the base calling is performed you end up with ~100MB of final results in fastq format. have in mind that the server has ~12TB of usable storage.
we have asked for the entire presentation, in which the data sizes are highly detailed, so in case you are interested in knowing data formats and sizes for chips 314 and 316 I'll be happy to share them here.
EDIT: I have found this Ion Torrent presentation which is somehow similar to the one I attended today, where the data flow I mentioned is covered on page 20.
EDIT2: I have just received the PDF of the presentation we were given yesterday by the LifeTech lab support team, and I am now able to show you some information about data sizes on the different process steps. Here is the table that describes the data summarization of the Ion Torrent pipeline:
[?]
Process Description File Types 314 chip 316 chip
Raw Voltage Data DAT 29 GB 141 GB
Signal Processing WELLS 1.4 GB 7 GB
Base Calls - Flow SFF 0.6 GB 3 GB
Base Calls - Base FASTQ 0.1 GB 0.5 GB
[?]
Great, Thanks a lot for sharing those details!
Thanks for the info. That would be great if you would like to share those details on data formats and sizes here!
I guess one of the main questions (since more and more groups are going away from storing data from before the base-calling step?), is the relative coverage needed, to get the same accuracy, in comparison to illumina, solid, etc, since that will determine how much data must be stored and handled in the data analysis step ... (for us at UPPMAX, that would mean the amount of data we have to store on our (expensive) center-wide parallell file system, used in the analyses)
Reg. images: Yea, that is why I was alluding to "not requiring to save raw images". But I realize now, Ion Torrent of course has its own kind of "raw data". But if accuracy is better, maybe the requirements for saving this raw data will be less?
Great, Thanks a lot for those details!