Hi,
I am trying to run Roary on a set of skin isolate sequences to create a pan genome to use in further comparative analyses.
I have QC'd, assembled (Unicycler) and annotated (Prokka) my sequences which are all Staphylococcus capitis. To confirm this I pulled out the 16S rRNA, rpoB and gap genes from the gbk files and run them on BLAST.
I use the command: roary -p 12 -f epi_95_mafft -e --mafft -n -v -r -i 95 ~/comp_genomics/prokka_annotation/epi_annotation/epi_gffs/*.gff
to run Roary.
The command appears to run fine, but this error message pops up at the end:
2019/02/25 14:31:10 Running command: mafft --auto --quiet pan_genome_sequences/group_5608.fa > pan_genome_sequences/group_5608.fa.aln All arguments to easy_init should be either an integer log level or a hash reference. at /usr/local/share/sanger-pathogens-Roary-459fd8e/lib/Bio/Roary/CommandLine/Common.pm line 22 --------------------- WARNING --------------------- MSG: Got a sequence without letters. Could not guess alphabet
After searching this error message I read that this can mean that the sequences are not very closely related, so to look into this I ran mash. I compared each of my sequences with the capitis type strain from NCBI. The output for this was a mash distance of around 0.02, p-value 0 and matching hash score of ~200-600/1000 for all sequences.
If anyone knows of any reason why this error message pops up or a way to fix it I'd be grateful for your help!
Thanks in advance!
Just a guess, but maybe your assembly includes some very short sequence (like just a few bp long) for which it's not possible to determine whether it's dna or protein