Hey there
Beginner to all of this so apologies if I'm slow with the terminologies. I'm teaching myself RNA-seq and so far the online lessons have been doing everything ** on macOS Terminal with batch scripts. While I'm fine with using terminal, I was hoping I could do all of this on Rstudio (or even python), within ONE script/file. Of course, the script would contain several functions (one for downloading one for trimming etc etc), but since I can't find any resources on it or sample R code I'm worried I'm thinking about this all wrong.
What do you guys think? Is it silly to do everything on R? Is there no way I can connect RStudio to an hpc ?
**downloading files, quality check, trimming, indexing, alignment - haven't gotten to visualization and counting yet
have a look at
rstudio-server
.HPC and interactive analysis are usually not really compatible as you first need so submit some jobs to book a node, and the node then would need to run rstudio which again needs to be available as a module (often not the case). Installing it manually is a pain without admin rights, not sure if even possible as it has a ton of dependencies and afaik needs to copy some stuff to
/var
etc. That all is cumbersome. Is the job really so demanding that you need a HPC as the backend, or can you do it on a local machine? The entire trimming, alignment thing is not done in R usually, one uses pipelines that are triggered via the Unix command line. In R there would be Rsubread which wraps the subread aligner and featureCounts for count table creation, but this takes time to run so it should be submitted as a batch job rather than having a RStudio window open for it. Towards RStudio on HPC, I would contact the HPC admin on whether they have rstudip-server in place. You can run it also via Singularity (a container engine) but this is a bit more advanced that you probably are at that point. Interactive analysis is usually done locally. So get the job for alignment etc done on the HPC, and then download the count table and analyse locally. RNA-seq analysis like DEGs and clustering is not hardware-intensive unless you have thousands of samples. Hope this helps a bit.I typical use Rmarkdown for running my R analyses and when necessary I use
system
to runbash
scripts for certain steps in a pipeline. Not sayings it's the one and only way to integrate the two but it works for me.