I am following the instruction mentioned on Submit to GEO to upload about 83G of RNA-seq data in gzipped form to the GEO FTP server. I was first using the following command, but the connection had a time-out after every file:
ncftpput -B 33554432 -z -u 'username' -p 'password' -v -R \
ftp-private.ncbi.nlm.nih.gov /fasp/ local_folder_to_upload
I then extended this to the following script, such that it retries until all files are uploaded:
#!/bin/bash
cd /home/ec2-user
try=0
COMPLETE_CONDITION=0
echo "START"
until [ "$lastresult" = "$COMPLETE_CONDITION" ]; do
let "try+=1"
echo "Try $try ..."
ncftpput -B 33554432 -z -u 'username' -p 'password' -v -R \
ftp-private.ncbi.nlm.nih.gov /fasp/ local_folder_to_upload
let "lastresult=$?"
echo "Last Resultcode: $lastresult"
done
echo "UPLOAD COMPLETED AFTER $try TRY(S)"
exit 0
Which worked in principal and after several tries I got all samples uploading correctly on GEO. However the error message persisted:
Could not read reply from control connection -- timed out.
Any thoughts on why this happens and how to resolve it? I does not look to be crucial as all files seem to be uploaded correctly.
This might not really solve the problem but: Does geo have a way to get some hash like sha or md5? If the checksum is ok, I would not bother too much. I would just want to make sure the files are not truncated.
Thank you very much for opening the question and for the bash script!
I had the same problem when uploading RNA-seq data to GEO. The connection broke several times and it was hard to submit all the fastq files. However, with your script it is working fine for the moment and I am not getting the error you mention.