I love how NCBI has a simple FTP server setup (ftp://ftp.ncbi.nih.gov/) where I can access data from GEO, Genbank, et cetera.
I was just curious---if one were to download all of that data, how much space would I need? My guess is it would be in petabytes or exabytes.
Does NCBI publish any kind of stat like that?
I guess you could write an FTP crawler and add up file sizes, or establish an SFTP connection and check the size of the top folder, if that's possible.
I was thinking of doing something like that. I know they don't support SFTP.
I haven't done any FTP crawling before but I imagine it's straightforward. I don't know how feasible it is. If there were 1B folders and I did 1k requests a second, it would take 10 days. Perhaps there are a few simple commands and the whole thing would take an hour. Perhaps I'd be rate limited to 100 requests per second and it would take 3 months :).
Figured asking around first would be a good starting place.
Asking around is always better than jumping head first. Good luck!