I am requesting NCBI's data and looks like it only allows three requests per second, so I wanted to parallelize requests for three query ids ${IDLIST} per second. I would like to know how I can set sleep time of 2 seconds in this code. I know in a for-loop we can just do sleep 2
, but what's the syntax to do this with parallel?
For example,
If I just do for three IDs, like below (head -3 "${IDLIST}
), the download request works:
parallel -j1 \
"IFS=$'\n';"'for hit in \
$(esearch -db sra -query {} | efetch --format runinfo | grep SRR); do \
echo "{},${hit}"; done' \
::: "$(head -3 "${IDLIST}")" \
| sort -t, -k9,9rn >> out.csv
But won't work for:
parallel -j1 \
"IFS=$'\n';"'for hit in \
$(esearch -db sra -query {} | efetch --format runinfo | grep SRR); do \
echo "{},${hit}"; done' \
:::: "${IDLIST}" \
| sort -t, -k9,9rn >> out.csv
Is there a way to limit three request per second in this code?
These are some IDLIST:
A-ADC-AD000037-BR-NCR-09AD14648
A-ADC-AD000044-BR-NCR-09AD14647
A-ADC-AD000068-BR-NCR-08AD8038
A-ADC-AD000075-BR-NCR-08AD9964
A-ADC-AD000092-BR-NCR-09AD13601
A-ADC-AD000096-BR-NCR-08AD9891
A-ADC-AD000097-BR-NCR-08AD9961
A-ADC-AD000104-BR-NCR-09AD14644
You are only going to make this worse. NCBI counts the queries per IP address. Have you signed up for NCBI_API_KEY? If not you should do that first. Ultimately NCBI counts number of requests per domain at a higher lever (if I recall right).
NCBI may have some of this information available in form of reports. Look around in ftp://ftp.ncbi.nlm.nih.gov/sra/reports/Metadata/. You can download the files and parse the info locally, if you have a really large number of queries.
@genomax I couldn't find anything older than "NCBI_SRA_Metadata_20181202.tar.gz". I need this from 201802. I just created the api_key and exported the variable
export api_key="key"
, but that still won't solve the problem. Where do I add this key? Thank you for your help.Add KEY to your
.bashrc
file for automatic export or you can export it in your terminal where you are going to run the searches from each time. ExportNCBI_API_KEY
as the variable.