How to get all chromosomes RefSeq ID from RefSeq assembly ID?
1
0
Entering edit mode
1 day ago
Maria • 0

I have a RefSeq assembly accession ID from an organism that has two chromosomes. I want to retrieve from NCBI the RefSeq chromosomes IDs and their lengths in base pairs.

I have been trying to use Entrez in python:

from Bio import Entrez
Entrez.email = "my@email"
handle = Entrez.esummary(db="assembly", term=RefSeq_assembly_accession)
summary = Entrez.read(handle)

but many times there seem to not found any assembly for the provided accession, nevertheless when I manually look it up in the NCBI webpage, the assembly is there.

Example: I have the following RefSeq assembly accession ID RefSeq_assembly_accession="GCF_030718785.1"

I would like to retrieve: chromosome_1= "NZ_CP132190.1" chromosome_1_size= 2,959,192 chromosome_2= "NZ_CP132189.1" chromosome_2_size=1,107,495

NCBI record for this example: https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_030718785.1/

Any help would be greatly appreciated! Thanks :)

AssemblyAccession Database Entrez RefSeq NCBI • 122 views
ADD COMMENT
1
Entering edit mode
1 day ago
GenoMax 148k

Using datasets/dataformat command line utils (LINK) :

$ datasets summary genome accession GCF_030718785.1 --report sequence --as-json-lines | dataformat tsv genome-seq --fields accession,genbank-seq-acc,refseq-seq-acc,chr-name,mol-type,seq-length

Assembly Accession      GenBank seq accession   RefSeq seq accession    Chromosome name Molecule type   Seq length
GCF_030718785.1 CP132190.1      NZ_CP132190.1   1       Chromosome      2959192
GCF_030718785.1 CP132189.1      NZ_CP132189.1   2       Chromosome      1107495
ADD COMMENT

Login before adding your answer.

Traffic: 2809 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6