Hello everyone.
I'm working on a bunch of small viral genomes, and I need to construct their core genomes for a comparative study. I've tried using prokka and roary to do so, but since my sequences are too small ( 10 kb or less). I keep getting errors and warnings even though I obtain an output, and I keep getting 0 core genes from roary. It turned out that prokka uses a tool called Prodigal, and the sequence size threshold it uses is 20kb. So eventually, I can't get GFF3 files from prokka to run roary. Now I'm left with one of two solutions:
1/ look for an other source of GFF3 files to run roary and get my core genomes. 2/ Try something else (other than roary) to generate core genomes.
If you happen to know other tools or scripts I can use in my case please let me know. Many thanks!!
Separately from my other post, the concept of core genes in viruses does not make as much sense as in prokaryotes. Most viruses carry a minimal number of genes to begin with - usually only genes required to regulate their own or host transcription, and to replicate and rebuild capsids. Can't imagine that there will be much difference between somewhat related viruses of similarly-sized genomes.
Exactly, I was expecting a potentially high number of core genes. Especially that these viruses are from the same genus (HIV and SIV). As for the reason I want to work with core genome rather than whole genome, it's because those are highly mutable viruses.