I am trying to find a way to exclude the unplaced contigs and provisional RefSeq entries from my local copy of the RefSeq database. I am not sure how to do this. I tried using blastdbcmd but there is no way to search for a string that is not part of the sequence identifier. I tried to find all RefSeq entries at NCBI with the word 'provisional' in them but got a bunch that do not have the word 'provisional' in them.
I was able to get a list of gi numbers for all RefSeq entries that have the words 'unplaced genomic contig' in the entry header.
Any ideas on getting the sequences labelled as provisional?
Thank you.