I'm working on a mega-project involving all proteobacterial proteomes.Recently NCBI relocated all assembly record here. Previously, records for plasmids were kept separately from main chromosomes, but now they are placed in one file. For example, GCA_000010825.1_ASM1082v1_protein.faa.gz.
Question: how in that record separate which proteins came from plasmids and which are from chromosomes?
If I worked on 1 genome, I could have traced each protein individually, but with more than 2,000 complete genomes it's not going to be feasible. Also, I cannot rely on sequence annotations such as,for example "plasmid backbone"since not all plasmid proteins are necessary "plasmid backbone" proteins.
Any ideas?
Wow! Thank you very much! I would have never known this.
Very useful , Thanks