Entering edit mode
7.6 years ago
User 3325
▴
80
How can I find core proteins (step by step) from 30+ strains data using roary pangenome analysis tool?
How can I find core proteins (step by step) from 30+ strains data using roary pangenome analysis tool?
Install Roary and Prokka
Go to path with fasta genomes
> cd path/to/fastas/
Annotate fasta genomes with Prokka
> for i in *.fa; do prokka $i --addgenes --locustag ${i%.*} --force --cpus 32; done
Collect all GFF files from each genome folder to tm folder
> mkdir tm && find . -name "*.gff" -type f -exec cp {} ./tm \;
Copy files to roary folder
> cp path/to/tm/* path/to/roary
Run roary to produce core genome alignment
> roary -e --mafft -p 32 *.gff
Or just comparison
> roary -p 32 *.gff
If needed change the minimum blastp percentage identity. Its not advised to go below 90% unless you know what you're doing.
> roary –i 90 *.gff
Check the file gene_presence_absence.csv
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Thank you so much for detailed reply. I have all gff and I have already run this command roary -e --mafft -p 32 *.gff but I could not identify where genes/protein sequences/ID are generated for core genes. Can you help help me finding that. even in file gene_presence_absence.csv how to extract core one.
Open file gene_presence_absence.csv in Excel and select all rows where 'No. isolates' equal to a number of analyzed genomes (strains). These genes will be core genes.
Thank you so much now I got it.
I am not able to follow this explanation,could you help on how to identify core genes from the presence absence file.
Could you please explain how to extract the SNPs from the final output of Roary.