I've recently started working with circular genomes from GenBank. I'd like to establish criteria for labeling a sequence as "circular," independent of the original GenBank classification. I will have labeled genes as reference.
Option 1: All expected genes must be present to classify a circular genome as "circular". Missing gene/s would indicate an incomplete genome, which should be interpreted as "linear".
Option 2: All expected genes must be present AND COMPLETE. The "circular" label is applied only if the genome is considered complete. This can be problematic if the origin occurs within a gene, effectively slicing it in half. I would have to demonstrate that central residues are likely missing, and labeling a sequence linear with gene segments on either end seems unintuitive.
Option 3: Always label the sequence as "circular" even if it's a partial genome. This wouldn't make much sense for singular genes or short reads.
Option 4: Keep the original classification. GenBank submission accuracy can be dubious, so I'd prefer a more personalized treatment.