Hello, first question I've posted here though it may be simple.
So, normally I wouldn't care so much about this because I've normally used a SLURM arrays to automatically submit but recently I've moved to a school with a cluster that is very difficult to work with (constant bus errors, partitions being called incorrectly, missing dependencies which I have to manually install and which may not be compatible with the R version available, etc.) so I've just been manually aligning my sequences one at a time or with a loop script.
This is of course VERY tedious and I have to ensure I don't turn off my computer etc. But it has shown me the need for some sort of progress bar in a number of bioinformatic tools and I was wondering if anybody has developed any simple code or if there's some option hidden in the documentation that I've missed. Thank you for any help anyone has!
I really would appreciate anyone's ideas on this. This is my command:
hisat2 -p 4 -k 1 --min-intronlen 50 --max-intronlen 500000 -x Homo_sapiens.GRCh38.dna.toplevel.fa -U SRR12816722.fastq.gz -S SRR12816722.fastq.gz.hisat.sam
I loop it through everything or, if SLURM isn't working that day, open up consecutive windows and do each manually -_-
It seems it may be worth your time to work with the cluster sys admin to help ameliorate these issues. Embracing conda may also help alleviate some of those problems.
As for tracking progress, I'd probably just
echo
the sample name for each loop.If you're running multiple
hisat2
commands, you could print the sample name before and after eachhisat2
command to give you some sense of where things stand. But if you're looking for hisat2's progress in alignment, unless hisat2 writes a progress log (which it should), this will not be a trivial task.Consider using
mosh
(LINK) so that you can keep your work going on the server (without worrying about logouts). Useconda
to install programs you need. Unfortunately admins may not like it if you run processes outside of SLURM so you do run the risk of running into that. You may want to be judicious about how you use these things.Unless
hisat
has progress checking built in you are not likely to be able to get aprogress bar
. Consider using other aligners if you must have something like that.bbmap.sh
allows for this usingshowprogress=0 If positive, print a '.' every X reads.
mosh
needs sysadmin's blessing too, as it has both client and server parts. I know this because my sysadmin doesn't want to allow mosh. Another compromise with mosh is that it obliterates scrollback history - mosh will keep you connected, but forget about scrolling up to see the command you ran a few minutes ago. To do that, you need to run tmux on the server and attach a mosh session to that. See: https://blog.filippo.io/my-remote-shell-session-setup/I don't need/use
mosh
but I was going by this on mosh site.I misremembered - there were missing dependencies (protocol buffers) that my sysadmin was not comfortable installing. My point is that one might still need sysadmin co-operation to run mosh, and I am not sure if conda-installing mosh or mosh dependencies would work.
EDIT:
I got it working. Solution:
mosh --server=/remote/path/to/mosh-server username@host -- /remote/path/to/tmux a
Thank you for the answers so quickly! So, I actually AM using a conda virtual environment which is where I have installed all my packages (many of which are bioconductor-based, so it is most simple for me to use bioconda anyways because installing through BioCManager normally through R is a pain without, again, the other dependencies like V8).
Also, I did not know about mosh. I will try and make more use of that - that is a major problem I often run into with these sorts of jobs - thank you.
It's too bad that there isn't a simple way to output a progress bar for alignment, but I am glad to learn about snakemake (which I also have never used). I'll try this out. Thanks again everyone.