You can now get gene ortholog data using the NCBI Datasets command-line tool using a gene ID, gene symbol, or RefSeq nucleotide or protein accession. Data are available for vertebrates and insects.
You can retrieve metadata for gene orthologs in JSON Format, or you can download a compressed (zip) archive containing both metadata and sequences.
For example, if you want the mammalian orthologs of the human BRCA1 gene you can use the following summary command to get metadata for these genes:
datasets summary ortholog symbol BRCA1 --taxon human --taxon-filter mammals > brca1-mammals.json
The gene metadata includes gene names and synonyms, genomic coordinates, RefSeq transcript and protein data, as well as Ensembl and UniProt accessions and other gene information.
If you want the sequences, use the datasets download command to download a zip archive that includes gene, transcript, and protein sequences as well as metadata in tabular and JSON lines formats:
datasets download ortholog symbol BRCA1 --taxon human --taxon-filter mammals --filename brca1-sequences.zip
See our help documentation for more information on using the datasets command-line tool to access ortholog data.
See the full blog post on NCBI Insights.
e.cox : I am assuming you are posting this in some official capacity by an affiliation with NCBI.
json
formats are fine for programmers but impossible to understand/manage for bench biologist. Are there any plans to allow reformating the end result in plain text format (as inEntrezDirect
)? Alternative is to include information on how to reformat the output so it becomes biologist readable.