Question

Pangenome analysis

0

Entering edit mode

13 months ago

erinda ▴ 10

Hi all! I am trying to use Prokka annotation and Roary for pangenome analysis (S. epidermidis bacteria), however I am facing troubles. After the results generated by Scoary, it appears that a group of genes (let's say group_xxx) is present in very few samples. But when I look for the sequence and BLAST it, I find it in many more samples (FASTA files). I also tried a different approach such as Bakta for the annotation and Panaroo for the pangenome, and the problem is persistent. I'm not sure what is going wrong, and if it has something to do with any threshold value. I am new to this, and I would greatly appreciate if you could provide any advice on this matter, or maybe let me know if you had similar problem using these tools...

Also, how would you double check the results from Prokka-Roary / Bakta-Panaroo..

Best wishes!

roary pangenome bakta prokka • 1.2k views

ADD COMMENT • link 12 months ago by erinda ▴ 10

1

Entering edit mode

What command are you running?

You may need the option "don't split paralogs" -s in Roary. Your gene of interest may have multiple copies in each genome, and is getting separated out in a different gene group.
You may need to toggle the blastp identity needed for it to count as the same gene group. The Roary site says to be careful not to go below 90 "-i 90".
What is the size of your core genome? It's possible your strains are quite diverse, or you have 1 or 2 outliers. Making a core genome tree or evaluating the size of the core genome would give you this information.

I like to use PIRATE for core genome & making gene alignments, it's always been a bit more intuitive and easier to use for me. https://github.com/SionBayliss/PIRATE

ADD REPLY • link 13 months ago by Madde ▴ 20

1

Entering edit mode

Hi Madde ! Thanks a lot! This was absolutely helpful! Indeed the "-s" option solved the problem...

These were the command that I was running:

Prokka: PROKKA_OPTS="--cpus 8 --force --compliant --centre X --locustag PROKKA \ --kingdom Bacteria \ --genus Staphylococcus \ --species epidermidis \ --usegenus"

Roary: roary -e --mafft -s -n -p 8 -f roary_output -v -i 95 *.gff

Scoary: scoary -g gene_presence_absence.csv -t traits.csv

However in the end I obtain many groups_XXX/hypothetical proteins... Not sure how to make a better annotation with prokka...

ADD REPLY • link 12 months ago by erinda ▴ 10