Entering edit mode
15 months ago
1769mkc
★
1.2k
Im trying to fetch a list of GSM id which could be seen that it is present in the project folder which I checked through sra explorer tool but when I try to download through a script it fails even after no of retries.
The error log generated Im attaching here I would like to know what exactly is failing here.
ncbi_error_report.txt
<Report>
<Run>
<Date>
<Start value="Wed Sep 27 2023 6:21:28 AM"/>
<End value="Wed Sep 27 2023 6:21:44 AM"/>
</Date>
<Home name="HOME" value="/root"/>
<Cwd>/tmp</Cwd>
<CommandLine argc="6">
<Arg index="0" value="fastq-dump"/>
<Arg index="1" value="-X"/>
<Arg index="2" value="1"/>
<Arg index="3" value="-Z"/>
<Arg index="4" value="--split-spot"/>
<Arg index="5" value="GSM2683458"/>
</CommandLine>
<Result rc="RC(rcVFS,rcMgr,rcOpening,rcDirectory,rcNotFound)"/>
<User admin="true"/>
</Run>
<Configuration>
<Files count="2">
<File name="/etc/ncbi/settings.kfg"/>
<File name="/root/.ncbi/user-settings.mkfg"/>
</Files>
<refseq state="not found"/>
<krypto state="pwfile: not found"/>
<sra>
<quality_type>raw_scores</quality_type>
</sra>
<Config>
<ConfigurationFiles>
/etc/ncbi/settings.kfg
/root/.ncbi/user-settings.mkfg
</ConfigurationFiles>
<APPNAME>"fastq-dump"</APPNAME>
<APPPATH>"/tmp/"</APPPATH>
<BUILD>"RELEASE"</BUILD>
<HOME>"/root"</HOME>
<HOST></HOST>
<LIBS>
<GUID>"119c217a-7b81-47e8-91d6-56d19c8c9f15"</GUID>
<IMAGE_GUID>"119c217a-7b81-47e8-91d6-62229c64ee59"</IMAGE_GUID>
</LIBS>
<NCBI_HOME>"/root/.ncbi"</NCBI_HOME>
<NCBI_SETTINGS>"/root/.ncbi/user-settings.mkfg"</NCBI_SETTINGS>
<OS>"linux"</OS>
<PWD>"/tmp"</PWD>
<USER></USER>
<VDB_CONFIG></VDB_CONFIG>
<VDB_ROOT></VDB_ROOT>
<kfg>
<arch>
<bits>"64"</bits>
<name>"56d19c8c9f15"</name>
</arch>
<dir>"/root/.ncbi"</dir>
<name>"user-settings.mkfg"</name>
</kfg>
<libs>
<cloud>
<report_instance_identity>"true"</report_instance_identity>
</cloud>
</libs>
<repository>
<user>
<ad>
<public>
<apps>
<file>
<volumes>
<flat></flat>
<flatAd>"."</flatAd>
</volumes>
</file>
<refseq>
<volumes>
<refseqAd>"."</refseqAd>
</volumes>
</refseq>
<sra>
<volumes>
<sraAd>"."</sraAd>
</volumes>
</sra>
<sraPileup>
<volumes>
<ad>"."</ad>
</volumes>
</sraPileup>
<sraRealign>
<volumes>
<ad>"."</ad>
</volumes>
</sraRealign>
<wgs>
<volumes>
<wgsAd>"."</wgsAd>
</volumes>
</wgs>
</apps>
<root>"."</root>
</public>
</ad>
</user>
</repository>
<sra>
<quality_type>"raw_scores"</quality_type>
</sra>
<vdb>
<lib>
<paths>
<kfg>"/usr/local/bin"</kfg>
</paths>
</lib>
</vdb>
</Config>
<RemoteAccess available="false"/>
<CurrentProtectedRepository found="false"/>
</Configuration>
<Object path="https://sra-pub-run-odp.s3.amazonaws.com/sra/SRR5755657/SRR5755657" type="database" fs_type="unexpected">
<Dependencies>
<List count="22" missing="22">
<Dependency index="0" seq_id="NC_000067.5" local="false" path=""/>
<Dependency index="1" seq_id="NC_000068.6" local="false" path=""/>
<Dependency index="2" seq_id="NC_000069.5" local="false" path=""/>
<Dependency index="3" seq_id="NC_000070.5" local="false" path=""/>
<Dependency index="4" seq_id="NC_000071.5" local="false" path=""/>
<Dependency index="5" seq_id="NC_000072.5" local="false" path=""/>
<Dependency index="6" seq_id="NC_000073.5" local="false" path=""/>
<Dependency index="7" seq_id="NC_000074.5" local="false" path=""/>
<Dependency index="8" seq_id="NC_000075.5" local="false" path=""/>
<Dependency index="9" seq_id="NC_000076.5" local="false" path=""/>
<Dependency index="10" seq_id="NC_000077.5" local="false" path=""/>
<Dependency index="11" seq_id="NC_000078.5" local="false" path=""/>
<Dependency index="12" seq_id="NC_000079.5" local="false" path=""/>
<Dependency index="13" seq_id="NC_000080.5" local="false" path=""/>
<Dependency index="14" seq_id="NC_000081.5" local="false" path=""/>
<Dependency index="15" seq_id="NC_000082.5" local="false" path=""/>
<Dependency index="16" seq_id="NC_000083.5" local="false" path=""/>
<Dependency index="17" seq_id="NC_000084.5" local="false" path=""/>
<Dependency index="18" seq_id="NC_000085.5" local="false" path=""/>
<Dependency index="19" seq_id="NC_000086.6" local="false" path=""/>
<Dependency index="20" seq_id="NC_000087.6" local="false" path=""/>
<Dependency index="21" seq_id="NC_005089.1" local="false" path=""/>
</List>
</Dependencies>
</Object>
<SOFTWARE>
<VDBLibrary vers="2.7.47"/>
<Build static="true">
<Module name=""/>
</Build>
<Tool date="Nov 18 2022" name="fastq-dump" vers="3.0.1">
<Binary path="/usr/local/bin/fastq-dump" type="alias" md5="c461c39bfa514aff3c4f7c0416ced617">
<Alias resolved="fastq-dump.3">
<Alias resolved="fastq-dump.3.0.1">
<Alias resolved="sratools.3.0.1"/>
</Alias>
</Alias>
</Binary>
</Tool>
</SOFTWARE>
<Env>
</Env>
</Report>
Any suggestion or help would be really appreciated
I don't think this log is helpful. Can't you just get fastq download links via sra-explorer.info or the tool mentioned by Rob in his answer here Fetch Fastq files directly for SRA data ?
Avoid SRA toolkit at all costs, it's a mess. If you're forced to use it then use prefetch to download sra files first and then use fastq-dump locally to convert the sra to fastq. Never fetch via fastq-dump directly, it's super picky and error-prone as you're experiencing.
" you're forced to use it" this is sort of since I have use docker image and then pass the GSM id as list of input first to check if the there are valid data files or not then it will go to the next step of making fastq. So right now strangely this works for some project samples without any issue and for some it doesnt work at all even though I added few retries.
Can you provide details of what you are doing and the commands you are using?
I will share you the shell script which is part of the pipeline where I basically call the docker image which contains the ncbi-sra tool kit and list of GSM id as input
my code
Can you also provide an example of a GSM ID that fails? I was not aware that you could use GSM ID's directly with
fastq-dump
. I would think that you would need to get the SRA accessions for GSM ID first.sra-explorer
is doing that conversion and perhaps that is why GSM ID work there.GSM2683458 this is the one test case which fails and this one GSM3603268 that woks fine
It looks like if you use the
GSM
ID's directly withfastq-dump
you end up with the following error (repeated 3x) though the retrieval seems to work.Mapping the GSM ID over the SRR accession first does not generate this error. Files (I only recovered a couple of reads) by both methods appear to be identical.
fastq-dump SRR5755657
works without errors.so how do I map on the go when I have GSM id as input?
I tried this to check
Did you run
vdb-config -i
to set up a temp directory for use withsratoolkit
? Error above is for that.You can map GSM ID's to SRA accessions using
EntrezDirect
(LINK). Using GSM ID's directly may be tricky since some ID's may map to more than one SRA ID.NCBI is the problem because the people making decisions there lack the minimal common sense and understanding of the problems they are trying to solve
when someone needs to download a simple file, they shouldn't need to run config this or config that,
they shouldn't need to install some obtuse, buggy, overcomplicated, and inefficient tool like
fastq-dump
fastq-dump
and the way SRA works demonstrate the disconnect and complete lack of accountability at the highest levels - and it has been like this for perhaps two decades - all along it has been and continues to be a bottleneck to scienceThe choices made by NCBI are the problem
https://hub.docker.com/r/ncbi/sra-tools this is the image im using but if I have to use this Did you run vdb-config -i to set up a temp directory for use with sratoolkit? which i did in in standalone system where it bring the gui and where we can see it but in case of image what and how am I suppose to configure the same?
Have you seen: https://github.com/ncbi/sra-tools/wiki/SRA-tools-docker
You could try the solution mentioned here: https://github.com/ncbi/sra-tools/issues/630
I got it working after i updated the sra lite option while using vdb-config -i
the output is what i see like this
You may want to use
--split-files
instead. With your--split-spot
option looks like you end up with interleaved data files. Unless you are dealing with it internally safer to get regular R1/R2 files.thank you for the resource i will look and try to adopt the fix and see it how it works
while i ran this for the successful gsm ID GSM3603268 with its respective SRA i see this output which was not the case for the above