Matchless comparison between Roary and Prokka: missing genes in "gene_presence_abscence.csv"
0
0
Entering edit mode
2.5 years ago
greed ▴ 10

Hi there! I recently ran Roary for some bacterial strains using GFF annotations produced via Prokka. Then I examined the outputs and I noticed that some gene IDs that are present in the GFF annotation of Prokka, are actually missing in the "gene_presence_abscence.csv" produced by Roary. How's that possible? Thank you.

id roary gene prokka • 1.3k views
ADD COMMENT
1
Entering edit mode

I know it's a bit of an old ping... but did you figure it out? I've used Roary for almost a decade, and now got the same problem on a tiny dataset of only 5 genomes with 3500 genes each. One of my divergent strains however only has 160 of it's genes in the output, even though you except all unique genes to be present and the full 3500 to be accounted for. I'm worried roary is becomming too old and is getting compatibility issues with newer version of prokka's gffs or something. Also the genes are randomly distributed and there is no clear pattern to it :S scratching my head

ADD REPLY
0
Entering edit mode

Hi Lesley! Interesting comment... I am also trying to use Prokka and Roary for some pangenome analysis for S. epidermidis bacteria, however I am facing some troubles. After the results generated by Scoary, it appears that a group of genes (let's say group_xxx) is present in very few samples. But when I look for the sequence and BLAST it, I find it in many more samples (FASTA files). I also tried a different approach such as Bakta annotation and Panaroo for the pangenome, and the problem is consistent. I'm not sure what is going wrong, and if it has something to do with any threshold value. I am new to this, and I would greatly appreciate if you could provide any advice on this, as I can see that you are quite experienced in this area.... Best!

ADD REPLY
0
Entering edit mode

What kind of genes are you missing? As far as I know Roary only works with protein coding genes

ADD REPLY
0
Entering edit mode

In Prokka annotation, the ID is an "hypothetical protein"

ADD REPLY
1
Entering edit mode

the ID is an "hypothetical protein"

That is the gene product name not the gene ID.

By the way, I had a similar issue with a different software (Anvi'o). It came out that truncated genes or genes spanning across multiple contigs were not included in the pan-genome analysis. I would manually check some of these genes and try to understand why they are missing in the pan-genome.

I'm sorry I couldn't be more helpful

ADD REPLY
0
Entering edit mode

Thank you, you've been helpful.

ADD REPLY

Login before adding your answer.

Traffic: 1998 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6