Simple Progress Bar with Alignment
1
0
Entering edit mode
3.6 years ago

Hello, first question I've posted here though it may be simple.

So, normally I wouldn't care so much about this because I've normally used a SLURM arrays to automatically submit but recently I've moved to a school with a cluster that is very difficult to work with (constant bus errors, partitions being called incorrectly, missing dependencies which I have to manually install and which may not be compatible with the R version available, etc.) so I've just been manually aligning my sequences one at a time or with a loop script.

This is of course VERY tedious and I have to ensure I don't turn off my computer etc. But it has shown me the need for some sort of progress bar in a number of bioinformatic tools and I was wondering if anybody has developed any simple code or if there's some option hidden in the documentation that I've missed. Thank you for any help anyone has!

I really would appreciate anyone's ideas on this. This is my command:

hisat2 -p 4 -k 1 --min-intronlen 50 --max-intronlen 500000 -x Homo_sapiens.GRCh38.dna.toplevel.fa -U SRR12816722.fastq.gz -S SRR12816722.fastq.gz.hisat.sam

I loop it through everything or, if SLURM isn't working that day, open up consecutive windows and do each manually -_-

Progress Hisat2 Alignment Sequencing • 1.9k views
ADD COMMENT
1
Entering edit mode

It seems it may be worth your time to work with the cluster sys admin to help ameliorate these issues. Embracing conda may also help alleviate some of those problems.

As for tracking progress, I'd probably just echo the sample name for each loop.

ADD REPLY
0
Entering edit mode

If you're running multiple hisat2 commands, you could print the sample name before and after each hisat2 command to give you some sense of where things stand. But if you're looking for hisat2's progress in alignment, unless hisat2 writes a progress log (which it should), this will not be a trivial task.

ADD REPLY
0
Entering edit mode

Consider using mosh (LINK) so that you can keep your work going on the server (without worrying about logouts). Use conda to install programs you need. Unfortunately admins may not like it if you run processes outside of SLURM so you do run the risk of running into that. You may want to be judicious about how you use these things.

Unless hisat has progress checking built in you are not likely to be able to get a progress bar. Consider using other aligners if you must have something like that. bbmap.sh allows for this using showprogress=0 If positive, print a '.' every X reads.

ADD REPLY
0
Entering edit mode

mosh needs sysadmin's blessing too, as it has both client and server parts. I know this because my sysadmin doesn't want to allow mosh. Another compromise with mosh is that it obliterates scrollback history - mosh will keep you connected, but forget about scrolling up to see the command you ran a few minutes ago. To do that, you need to run tmux on the server and attach a mosh session to that. See: https://blog.filippo.io/my-remote-shell-session-setup/

ADD REPLY
0
Entering edit mode

I don't need/use mosh but I was going by this on mosh site.

You don't need to be the superuser to install or run Mosh. The client and server are executables run by an ordinary user and last only for the life of the connection.

ADD REPLY
1
Entering edit mode

I misremembered - there were missing dependencies (protocol buffers) that my sysadmin was not comfortable installing. My point is that one might still need sysadmin co-operation to run mosh, and I am not sure if conda-installing mosh or mosh dependencies would work.

EDIT:

I got it working. Solution:

  1. Use conda to install mosh and tmux
  2. Run mosh from client so: mosh --server=/remote/path/to/mosh-server username@host -- /remote/path/to/tmux a
ADD REPLY
0
Entering edit mode

Thank you for the answers so quickly! So, I actually AM using a conda virtual environment which is where I have installed all my packages (many of which are bioconductor-based, so it is most simple for me to use bioconda anyways because installing through BioCManager normally through R is a pain without, again, the other dependencies like V8).

Also, I did not know about mosh. I will try and make more use of that - that is a major problem I often run into with these sorts of jobs - thank you.

It's too bad that there isn't a simple way to output a progress bar for alignment, but I am glad to learn about snakemake (which I also have never used). I'll try this out. Thanks again everyone.

ADD REPLY
0
Entering edit mode
3.6 years ago

Not a definitive answer but...

missing dependencies which I have to manually install and which may not be compatible with the R version available

I second the suggestion that others have mentioned about using conda and conda environments.

I've just been manually aligning my sequences one at a time or with a loop script

This and the rest of your post suggests that snakemake may be what you need.

Incidentally, snakemake won't tell you the progress of an individual job - I don't think there is any reliable and general way of doing that as it depends on the particular program you use. However, snakemake will tell you how many jobs have been completed, how many are left to execute, and the shell command that has been submitted. There is also a --dry-run option that shows what would be done without executing anything.

For long-running pipelines where I need to disconnect from the cluster I just do:

nohup snakemake [options] &
ADD COMMENT

Login before adding your answer.

Traffic: 2305 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6