Hi All, I tried to submit paired end fastq files (R1 and R2) for a sample to NCBI SRA database. I tried the following steps:
1) Created a bioproject profile by following the link: https://submit.ncbi.nlm.nih.gov/subs/bioproject/SUB4422178/submitter Filled in everything and submitted to get SAMN and PRJNA ids, then I selected FTP uploads
2) Then went to https://submit.ncbi.nlm.nih.gov/subs/sra/ First I moved both R1 and R2 files to a separate directory and cd to that directory. Then in the terminal typed:
ftp -i
open ftp-private.ncbi.nlm.nih.gov
Then on the prompt, typed username and password from link in 2
Username: subftp
Password: w*******
cd to account folder from link in 2 cd uploads/amyname@gmail.com_00YmVxw2
3) Created a new directory as shown below:
mkdir rhizophagus
cd rhizophagus
4) Then transfered both fastq files to ncbi directory by typing:
mput *
After transfer was complete, I typed ls
to see all files that have been transferred.
5) I then submitted the files using upload folder from https://submit.ncbi.nlm.nih.gov/subs/sra/ once the files were available on the database. I then selected the folder and submitted the folder.
Both of these files are now online here https://trace.ncbi.nlm.nih.gov/Traces/sra/?study=SRP158305. However, when I tried to download these file using prefetch --option-file sratest.txt
and extract with fastq-dump --split-files SRP158305
, I got two fastq files, but one file is 12.2 gb and the other file is only 303.3 mb. The actual file size of each fastq (R1 and R2) should be 10.9, but the downloaded fastq's are 12.2 gb and 303mb. I am not sure how it should have been submitted, but I would really if somebody could help me figure out where it went wrong. Thanks for your help in advance.
If you go to the link above they appear to be similar sized.
Did you upload them uncompressed? Perhaps SRA has already converted them (to
.sra
) and/or compressed them further.@genomax Yes, I did not compress them. I just uploaded the two fastq files. Should I have to compress them and make one compressed file before uploading?
And what's the procedure to remove/update already submitted files? Is it possible to remove them and resubmit?
Doing
fastq-dump --split-files SRR7716298
seems to recover asymmetric sized files as you posted above. I suggest that you email SRA support to let them know what is happening and ask them to reset your submission so you can re-upload the data.Thanks. So when I submit it again, should I just compress both R1 and R2 and make one compressed file and submit?
To further clarify I am seeing this when I do
fastq-dump --split-files SRR7716298
.What does that mean? Could you please clarify?
Either a file or the SRA record must have become corrupt. I assume this is original raw data?
That's right, these are raw data. I have emailed SRA support to reset it. So when I re-upload the files, should I just compress both files and submit as one compressed file?
Compress and submit them as a pair.
Sorry still confused- should I submit as one compressed file (with both R1 and R2) or two individually compressed files?
gzip each file separately and submit as two files
Thanks, I will give it a try.