Hello everyone,
I would like to download cDNA dataset from this published paper. In this paper, the cDNA dataset was submitted in EMBL database. I was not able to download these cDNA sequences in Fasta format from EMBL, but it worked in NCBI. here is the Accession numbers for submitted data can access in NCBI: Oryza sativa ssp. indica cv. Guangluai 4 full-length cDNAs (10,096)
CT827960-CT834770, CT836522-CT836598, CT837477-CT837976, CT834771-CT836521, CT827880-CT827943, CT836599-CT837476
I tried to download it manually in the NCBI website. I want to use script or command line to automatic download 10,096 cDNA sequences above because it not effective incase I download it manually. I create a script to download these dataset, but it not work like my expectation. The download files are log files, not in fasta format.
Here is my script:
#!/bin/bash
start_accession=827960
end_accession=834770
for (( i=$start_accession; i<=$end_accession; i++ )); do
genbank_number="$i.1"
download_url="https://www.ncbi.nlm.nih.gov/search/api/download-sequence/?db=nuccore&id=$genbank_number"
echo "Downloading genbank number $genbank_number..."
wget "$download_url"
echo "Download of genbank number $genbank_number completed."
done
Does anyone have any guidance for me in this case? Thank you everyone for any support.
Thank you so much for your valuable guidance. I tried and it worked.
I attached the adjusted script in case someone need it.
Pierre Lindenbaum - I recently got some clarity on GIs from the esummary docs page, which clarifies that:
This had been a point of confusion for me so wanted to share !
VL