Hi, I have a list of assembly accession number like this GCF_000421605.1 GCF_001652585.1 GCF_012317585.1 GCF_011207455.1
How can I automatically get the sample details of these assemblies as a list?
Hi, I have a list of assembly accession number like this GCF_000421605.1 GCF_001652585.1 GCF_012317585.1 GCF_011207455.1
How can I automatically get the sample details of these assemblies as a list?
You can do this using NCBI Datasets command-line tool. First, make a list of all the assembly accessions and use NCBI Datasets to download a package (we use --dehydrated
flag to fetch only metadata and skip all sequence and annotation data). Then, use the dataformat
tool to convert the assembly report from jsonl to a tabular format.
## input file
$ cat accs.txt
GCF_000421605.1
GCF_001652585.1
GCF_012317585.1
GCF_011207455.1
## download using datasets
$ datasets download genome accession --inputfile accs.txt --dehydrated
## contents of the datasets package
$ unzip -v ncbi_dataset.zip
Archive: ncbi_dataset.zip
Length Method Size Cmpr Date Time CRC-32 Name
-------- ------ ------- ---- ---------- ----- -------- ----
1604 Defl:N 769 52% 2022-06-04 08:47 3de26d82 README.md
12987 Defl:N 3514 73% 2022-06-04 08:47 53d1bc19 ncbi_dataset/data/assembly_data_report.jsonl
4497 Defl:N 782 83% 2022-06-04 08:47 af0e24e7 ncbi_dataset/fetch.txt
2522 Defl:N 385 85% 2022-06-04 08:47 38db1709 ncbi_dataset/data/dataset_catalog.json
-------- ------- --- -------
21610 5450 75% 4 files
## extract metadata into a table using dataformat
$ dataformat tsv genome --package ncbi_dataset.zip > assm_tbl.txt
Note, the output table assm_tbl.txt
is quite big with 95 fields. But it should be feasible to load the file into a spreadsheet app and filter data as needed. Alternately, if you are interested in only a specific set of biosample attributes you can use dataformat
to extract only those. Finally, if you are conversant with JSON, you can use a tool like jq
(https://stedolan.github.io/jq/) to conditionally extract only certain fields.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
use ncbi eutils: https://www.ncbi.nlm.nih.gov/books/NBK25500/