Hi all! I am trying to use Prokka annotation and Roary for pangenome analysis (S. epidermidis bacteria), however I am facing troubles. After the results generated by Scoary, it appears that a group of genes (let's say group_xxx) is present in very few samples. But when I look for the sequence and BLAST it, I find it in many more samples (FASTA files). I also tried a different approach such as Bakta for the annotation and Panaroo for the pangenome, and the problem is persistent. I'm not sure what is going wrong, and if it has something to do with any threshold value. I am new to this, and I would greatly appreciate if you could provide any advice on this matter, or maybe let me know if you had similar problem using these tools...
Also, how would you double check the results from Prokka-Roary / Bakta-Panaroo..
Best wishes!
What command are you running?
What is the size of your core genome? It's possible your strains are quite diverse, or you have 1 or 2 outliers. Making a core genome tree or evaluating the size of the core genome would give you this information.
I like to use PIRATE for core genome & making gene alignments, it's always been a bit more intuitive and easier to use for me. https://github.com/SionBayliss/PIRATE
Hi Madde ! Thanks a lot! This was absolutely helpful! Indeed the "-s" option solved the problem...
These were the command that I was running:
Prokka: PROKKA_OPTS="--cpus 8 --force --compliant --centre X --locustag PROKKA \ --kingdom Bacteria \ --genus Staphylococcus \ --species epidermidis \ --usegenus"
Roary: roary -e --mafft -s -n -p 8 -f roary_output -v -i 95 *.gff
Scoary: scoary -g gene_presence_absence.csv -t traits.csv
However in the end I obtain many groups_XXX/hypothetical proteins... Not sure how to make a better annotation with prokka...