I found that Biostars is very helpful!. The following new question puzzles me these days. I can see the genome features and its sequences from NCBI benbank (link) for access Id: DS264095, but Entriz could not retrieve the sequence. The output of the following code shows all Ns in sequence, but with the length that matches the size of the genome (1030563bp). I retrieve the corresponding gbk file, the gbk file just contains the features and CONTIGs, without actual genome sequence. Would you have any suggestions? Thank you!
from Bio import SeqIO
from Bio import Entrez
#https://www.ncbi.nlm.nih.gov/nuccore/147747968?report=genbank
handle = Entrez.efetch(db='nuccore', rettype='gb', id='DS264095',retmode='text')
for seqRecord in SeqIO.parse(handle, 'genbank'):
seq=seqRecord.seq
print('seq:',seq[0:100])
print('len:',len(seq))
#The outputs:
seq:
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
len: 1030563
The retrieved gbk file is as follows, which has different content than the data shown in NCBI site.
LOCUS DS264095 1030563 bp DNA linear CON 18-MAY-2007
DEFINITION Burkholderia mallei FMH scf_1099471655815 genomic scaffold, whole
genome shotgun sequence.
ACCESSION DS264095 AAIQ02000000
VERSION DS264095.1
DBLINK BioProject: PRJNA13987
BioSample: SAMN02435848
KEYWORDS WGS.
SOURCE Burkholderia mallei FMH
ORGANISM Burkholderia mallei FMH
Bacteria; Proteobacteria; Betaproteobacteria; Burkholderiales;
Burkholderiaceae; Burkholderia; pseudomallei group.
REFERENCE 1 (bases 1 to 1030563)
AUTHORS DeShazer,D., Woods,D.E. and Nierman,W.C.
TITLE Direct Submission
JOURNAL Submitted (06-MAR-2007) The Institute for Genomic Research, 9712
Medical Center Drive, Rockville, MD 20850, USA
FEATURES Location/Qualifiers
source 1..1030563
/organism="Burkholderia mallei FMH"
/mol_type="genomic DNA"
/strain="FMH"
/db_xref="taxon:334802"
CONTIG join(AAIQ02000135.1:1..8136,gap(457),AAIQ02000043.1:1..44311,
gap(1031),AAIQ02000166.1:1..3120,gap(1729),AAIQ02000194.1:1..761,
gap(36),AAIQ02000039.1:1..45955,gap(118),AAIQ02000068.1:1..28166,
gap(192),AAIQ02000195.1:1..749,gap(685),AAIQ02000204.1:1..289,
gap(154),AAIQ02000163.1:1..3538,gap(375),AAIQ02000123.1:1..10313,
gap(588),AAIQ02000142.1:1..7218,gap(466),AAIQ02000021.1:1..68386,
gap(239),AAIQ02000069.1:1..27890,gap(395),AAIQ02000099.1:1..17802,
gap(481),AAIQ02000038.1:1..45969,gap(717),AAIQ02000152.1:1..6039,
gap(100),AAIQ02000162.1:1..3813,gap(349),AAIQ02000130.1:1..9302,
gap(951),AAIQ02000104.1:1..15966,gap(744),AAIQ02000082.1:1..23397,
gap(2853),AAIQ02000178.1:1..2005,gap(36),AAIQ02000189.1:1..1160,
gap(36),AAIQ02000120.1:1..10635,gap(36),AAIQ02000184.1:1..1724,
gap(489),AAIQ02000121.1:1..10540,gap(720),AAIQ02000055.1:1..34907,
gap(378),AAIQ02000117.1:1..11883,gap(254),AAIQ02000033.1:1..54313,
gap(288),AAIQ02000137.1:1..7858,gap(863),AAIQ02000115.1:1..12452,
gap(592),AAIQ02000009.1:1..106604,gap(722),AAIQ02000149.1:1..6242,
gap(593),AAIQ02000186.1:1..1381,gap(36),AAIQ02000169.1:1..2881,
gap(468),AAIQ02000148.1:1..6247,gap(437),AAIQ02000164.1:1..3492,
gap(464),AAIQ02000126.1:1..10017,gap(636),AAIQ02000141.1:1..7280,
gap(731),AAIQ02000174.1:1..2399,gap(36),AAIQ02000173.1:1..2519,
gap(246),AAIQ02000013.1:1..98333,gap(237),AAIQ02000168.1:1..2885,
gap(278),AAIQ02000106.1:1..15267,gap(583),AAIQ02000177.1:1..2034,
gap(495),AAIQ02000183.1:1..1748,gap(804),AAIQ02000046.1:1..41423,
gap(357),AAIQ02000167.1:1..3108,gap(36),AAIQ02000171.1:1..2650,
gap(36),AAIQ02000087.1:1..22096,gap(728),AAIQ02000199.1:1..476,
gap(199),AAIQ02000180.1:1..1969,gap(36),AAIQ02000205.1:1..262,
gap(262),AAIQ02000129.1:1..9762,gap(590),AAIQ02000160.1:1..4201,
gap(473),AAIQ02000150.1:1..6226,gap(1027),AAIQ02000176.1:1..2096,
gap(279),AAIQ02000032.1:1..57068,gap(491),AAIQ02000094.1:1..18896,
gap(669),AAIQ02000058.1:1..33027,gap(36),AAIQ02000201.1:1..386,
gap(625),AAIQ02000125.1:1..10029)
//
Please format your post appropriately in the future.
Sure, I will make sure the future posts are well formatted. Thanks Ram.