Question

How Are Storage Requirements Affected By The New Ion Torrent Sequencing Technology?

2

Entering edit mode

13.8 years ago

Samuel Lampa ★ 1.3k

A publication about the new Ion Torrent tech was published the other day on Nature.

... but they mention nothing on produced data amounts. Anybody heard any details on this?

At least, not requiring to store the raw images (since this is a "no image" method) should save some storage space, but would be interested to know more in detail.

next-gen sequencing data • 5.4k views

ADD COMMENT • link updated 13.8 years ago by Jorge Amigo 14k • written 13.8 years ago by Samuel Lampa ★ 1.3k

score 8 · Answer 1 · 2011-07-21

8

Entering edit mode

13.8 years ago

Jorge Amigo 14k

let me first clarify that the Ion Torrent technology is not image based: it works by measuring pH variations that are translated into voltage differences. this is the main reason why this technique is more accurate and far more fast than SOLiD for instance. so in terms of disk size we definitely avoid one of the most hard to handle issues from previous technologies. still, data size has to be studied.

during this week we are having the Ion Torrent lab training, and we just had today an Ion Torrent introduction to the general public today in which data sizes were mentioned. in it, they stressed that the raw data that arrives to the torrent server is ~30GB for the 314 chip (10Mb), compared to the ~170GB of reads that a pair end exome experiment generates on the SOLiD, and that after the base calling is performed you end up with ~100MB of final results in fastq format. have in mind that the server has ~12TB of usable storage.

we have asked for the entire presentation, in which the data sizes are highly detailed, so in case you are interested in knowing data formats and sizes for chips 314 and 316 I'll be happy to share them here.

EDIT: I have found this Ion Torrent presentation which is somehow similar to the one I attended today, where the data flow I mentioned is covered on page 20.

EDIT2: I have just received the PDF of the presentation we were given yesterday by the LifeTech lab support team, and I am now able to show you some information about data sizes on the different process steps. Here is the table that describes the data summarization of the Ion Torrent pipeline: [?] Process Description File Types 314 chip 316 chip Raw Voltage Data DAT 29 GB 141 GB Signal Processing WELLS 1.4 GB 7 GB Base Calls - Flow SFF 0.6 GB 3 GB Base Calls - Base FASTQ 0.1 GB 0.5 GB [?]

ADD COMMENT • link 13.8 years ago by Jorge Amigo 14k

1

Entering edit mode

Great, Thanks a lot for sharing those details!

ADD REPLY • link 13.8 years ago by Samuel Lampa ★ 1.3k

0

Entering edit mode

Thanks for the info. That would be great if you would like to share those details on data formats and sizes here!

ADD REPLY • link 13.8 years ago by Samuel Lampa ★ 1.3k

0

Entering edit mode

I guess one of the main questions (since more and more groups are going away from storing data from before the base-calling step?), is the relative coverage needed, to get the same accuracy, in comparison to illumina, solid, etc, since that will determine how much data must be stored and handled in the data analysis step ... (for us at UPPMAX, that would mean the amount of data we have to store on our (expensive) center-wide parallell file system, used in the analyses)

ADD REPLY • link 13.8 years ago by Samuel Lampa ★ 1.3k

0

Entering edit mode

Reg. images: Yea, that is why I was alluding to "not requiring to save raw images". But I realize now, Ion Torrent of course has its own kind of "raw data". But if accuracy is better, maybe the requirements for saving this raw data will be less?

ADD REPLY • link 13.8 years ago by Samuel Lampa ★ 1.3k

0

Entering edit mode

Great, Thanks a lot for those details!

ADD REPLY • link 13.8 years ago by Samuel Lampa ★ 1.3k

score 2 · Answer 2 · 2011-07-21

2

Entering edit mode

13.8 years ago

lh3 33k

I heard that Ion Torrent produces SFF/SFF-like files at the end of the pipeline, like 454. The two technologies share many in common (e.g. error rate, homopolymer errors, read length, etc.). To me, it does not matter how much the intermediate files takes. Yes, Illumina image files are huge, but you do not get images now. Many sequencing centers even drop intensity files after a while and only keep fastq/BAM in the long term which are pretty small in the compressed form.

Also to your comment: in my understanding, Ion at present does not compete with HiSeq on base accuracy and price as well. Nonetheless, Ion Torrent is a fast evolving technology and potentially produces longer reads which are beneficial to many applications. It is promising. The other AB technology SOLiD is now (if not always) in an awkward position...

ADD COMMENT • link 13.8 years ago by lh3 33k

1

Entering edit mode

Years away, we'd better keep all the sequences in BWT/CSA (if the theory has reached that level). I see reference based compression as a transition phase rather than the end point. At present, only a few big centers replace fastq with BAM. More labs take fastq as the storage format.

ADD REPLY • link 13.8 years ago by lh3 33k

1

Entering edit mode

storing BAMs only is a perfectly viable solution in my opinion, but while we can we are storing raw reads from the sequencer once we perform all the quality checks available and we are sure that the wet lab part of the experiment has performed well. but thinking about the future, we foresee that the best way of storing data, considering all the possible standards changing through time, would be by obtaining and directly storing more DNA sample ;)

ADD REPLY • link 13.8 years ago by Jorge Amigo 14k

0

Entering edit mode

Thanks! That's useful info about the intemediate format!

ADD REPLY • link 13.8 years ago by Samuel Lampa ★ 1.3k

0

Entering edit mode

Yeah, in the short term, centers will likely just be storing bams anyway, and in the long term (years away), we'll probably abandon storing reads all together in favor of a smaller format that just details all the differences from a reference.

ADD REPLY • link 13.8 years ago by Chris Miller 22k