NCBI has two sections for assemblies, genbank (all submitted sequences) and RefSeq (curated genbank sequences).
A list of both is available here:
ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/assembly_summary_refseq.txt
and ftp://ftp.ncbi.nlm.nih.gov/genomes/genbank/assembly_summary_genbank.txt
.
Now I want to get a list of assembly (genome) accession numbers, that are in genbank but not in RefSeq. Unfortunately I could not find any mapping file on NCBI's sites. Has someone an idea how to obtain that list?
Unless I'm missing a trick, this should be as simple as something like:
Haven't double checked that this is 100% accurate though, and assumes I got the files the right way round!