I'm trying to download transcriptomes from NCBI, one of which can be seen here: https://ftp.ncbi.nlm.nih.gov/genomes/refseq/fungi/Saccharomyces_cerevisiae/latest_assembly_versions/GCF_000146045.2_R64/GCF_000146045.2_R64_rna_from_genomic.fna.gz
and am trying to use NCBI's efetch
to help. I've tried
efetch -db protein -id GCF_000146045.2 -format fasta
efetch -db gds -id GCF_000146045.2_R64
and numerous iterations thereof, but to no avail.
I've read through https://www.ncbi.nlm.nih.gov/books/NBK179288/ and https://www.ncbi.nlm.nih.gov/books/NBK3837/#EntrezHelp.Writing_Advanced_Sea but nothing relevant seems to turn up.
I'm aware that this can be downloaded through the browser, or through wget
with the link, but I need to script this to avoid errors and get the links. Preferably all I need to enter is species and rRNA.
How can I use efetch
or some other command line tool to download this data at the command line?
thank you! for reference, the programs can be downloaded from https://www.ncbi.nlm.nih.gov/datasets/docs/v2/download-and-install/ Is there a way of downloading the data without having to know the accession? i.e. just using the species name?
I think one way would be to use taxon ID for the species you are interested. One could always search for the accessions using
EntreezDirect
(which it is good at) and then use those withdatasets
.Yes, you can use the species name. You need to use the option taxon and provide a scientific name, or common name or NCBI taxid:
Please let me know if you have any questions.
MirianT_NCBI: Can
preview
option actually list/show the accessions that would be included? As is the information shown is not very useful.GenoMax , you can check the list of accessions using a different command:
The idea of the
--preview
flag under thedownload
option is to give users an idea of the data package size. But I can see how it would be useful to have other info there. Is there anything else you would like to see? I can pass the suggestions to the team :) Thanks!