Hi everyone! I read the excellent guide on Biostars about snakemake and though I might find some help here, it has been hard finding anyone who has even heard of snakemake.
Posting here because I am hoping someone with a little more knowledge about conda and snakemake can help me out. I am doing a bachelors project involving analyses of bacterial WGS. The research group has been talking to another research group which has developed BacDist and they are very interested in me getting it to work. Unfortunately, due to the COVID situation, the other research group is on lockdown and all of them are infection specialists so I think I can expect very little instruction from there until further notice. I haven't done anything with conda or snakemake before. Nevertheless, I like a challenge, so I got to work setting up a conda environment with all of the listed dependencies and that was no biggie.
Now to the errors - when attempting a dry run I get a ModuleNotFoundError, it appears in line 4 in the Snakefile, SeqIO should be imported from Bio, I assume from Bioperl. Now, when I make an environment using Biopython instead of Bioperl, this line works, but other errors happen and since the dependency listed only includes Bioperl, not biopython, I assume I am forced to use Bioperl.
I have used conda to install Bioperl. It would be helpful also with input on how to quickly check for the availability of SeqIO and Bio without running the entire script, and what path it should be associated with.
I tried using cpanm instead, this leads to a bunch of errors because it cannot install dependencies. I also see that it tries to install these into my home directory instead of the conda environment directory, so I might be missing something there. Anyways, I don't know if this is the path I want to head down, so I am hoping for some feedback from here before I keep flicking switches and end up having to redo everything (again).
I wonder a little bit why this snakemake script doesn't include an environment.yml? It seems like that would make it a lot more portable.
I will be very grateful for any and everything you can tell me!
In this context, Bio and SeqIO are biopython.
It seems that SeqIO is only needed to check if the genebank file for the reference is correct. If you ensure to only provide a valid gbk file, you might as well just comment out the and the check_for_plasmids() function plus its call (lines 47-53).