Entering edit mode
2.5 years ago
greed
▴
10
Hi there! I recently ran Roary for some bacterial strains using GFF annotations produced via Prokka. Then I examined the outputs and I noticed that some gene IDs that are present in the GFF annotation of Prokka, are actually missing in the "gene_presence_abscence.csv" produced by Roary. How's that possible? Thank you.
I know it's a bit of an old ping... but did you figure it out? I've used Roary for almost a decade, and now got the same problem on a tiny dataset of only 5 genomes with 3500 genes each. One of my divergent strains however only has 160 of it's genes in the output, even though you except all unique genes to be present and the full 3500 to be accounted for. I'm worried roary is becomming too old and is getting compatibility issues with newer version of prokka's gffs or something. Also the genes are randomly distributed and there is no clear pattern to it :S scratching my head
Hi Lesley! Interesting comment... I am also trying to use Prokka and Roary for some pangenome analysis for S. epidermidis bacteria, however I am facing some troubles. After the results generated by Scoary, it appears that a group of genes (let's say group_xxx) is present in very few samples. But when I look for the sequence and BLAST it, I find it in many more samples (FASTA files). I also tried a different approach such as Bakta annotation and Panaroo for the pangenome, and the problem is consistent. I'm not sure what is going wrong, and if it has something to do with any threshold value. I am new to this, and I would greatly appreciate if you could provide any advice on this, as I can see that you are quite experienced in this area.... Best!
What kind of genes are you missing? As far as I know Roary only works with protein coding genes
In Prokka annotation, the ID is an "hypothetical protein"
That is the gene product name not the gene ID.
By the way, I had a similar issue with a different software (Anvi'o). It came out that truncated genes or genes spanning across multiple contigs were not included in the pan-genome analysis. I would manually check some of these genes and try to understand why they are missing in the pan-genome.
I'm sorry I couldn't be more helpful
Thank you, you've been helpful.