Solutions for genomic dataset storage?
2
0
Entering edit mode
7.6 years ago
DavidStreid ▴ 90

Hi All. I'd appreciate any maintainable solutions for storing genomic data when it scales to tens to hundreds of terabytes of data. E.g. external harddrives, cloud storage. Any comments on what you guys are using and why would be great. Thanks in advance!

genome sequence storage cloud • 1.8k views
ADD COMMENT
1
Entering edit mode

Tape. Because it's cheap.

ADD REPLY
0
Entering edit mode

Compression and encoding. Algorithms will always beat hardware.

ADD REPLY
1
Entering edit mode
7.6 years ago
GenoMax 148k

I second tape. You can fit 4-6TB per LTO-6 tape with hardware compression.

But if you don't have tape infrastructure available and are starting from scratch then the cheapest archival cloud storage (google coldline. amazon glacier) could work. This would assume you would basically not be accessing that data regularly to make it cost effective.

ADD COMMENT
0
Entering edit mode
7.6 years ago
igor 13k

Theoretically, you could store the data in CyVerse, which is even cheaper than tape for academic use:

CyVerse's cloud-based data storage is optimized for large data, free to most scientific researchers, accessible through multiple interfaces, and leaves access control in the hands of the data owners.

From: http://www.cyverse.org/data-policy

As with all cloud services, the service could shut down or change pricing or just lose/corrupt your files at any point.

ADD COMMENT

Login before adding your answer.

Traffic: 2015 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6