How to download *_cds_from_genomic.fna.gz
(CDS from genomic FASTA) from ncbi for pangenome analysis?
How to download *_cds_from_genomic.fna.gz
(CDS from genomic FASTA) from ncbi for pangenome analysis?
You can use ncbi-genome-download
. The instructions for installation and usage are available here
https://github.com/kblin/ncbi-genome-download
For what you are asking, you need to specify the -F
parameter as 'cds-fasta'
Hi,
You can use NCBI Datasets. The default genome package includes:
To download only the cds
, you can use the following command (I'm using human as example, but you can use any taxonomic level):
datasets download genome taxon human \
--exclude-gff3 --exclude-protein --exclude-rna --exclude-seq \
--filename cds_only.zip
If you're downloading a really large number of files (let's say all vertebrates), I would recommend adding the flag --dehydrated
. With this flag, datasets
downloads the json and jsonl files, and a file called fetch.txt with paths to the data to be downloaded (rehydrated). To rehydrate a package, you can follow the steps below:
unzip cds_only.zip -d cds_only
datasets rehydrate --directory cds_only
I hope it helps!
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
What have you tried? Please add more detail to your question.