BD Rhapsody software - single cell seq
3
1
Entering edit mode
20 months ago

Dear all,

Has anyone worked with BD Rhapsody single cell data before?

I'm sure there are other alternatives with Dropseqpipe and others, but has anyone found a decent solution for these data? Their library design does not seem that simple either with variable bases in the V2 scheme. https://teichlab.github.io/scg_lib_structs/methods_html/BD_Rhapsody.html

Thanks

Edit: so local usage is possible, as LChart says below.

You basically have to install common workflow language CWL using their PDF instruction guide, then download their repository of just yml and cwl files from here https://bitbucket.org/CRSwDev/cwl/src/master/. Then you edit the yaml files to point to your input files (should be easy).

The next step is running with eg

cwl-runner --outdir out1 rhapsody_wta_1.9.1.cwl template_wta_1.9.1.yml

and then experience cryptic CWL errors. I haven't been successful yet, and am not sure if the problem is with the CWL from BD Rhapsody, or the yml syntax from me (which appears valid).

Edit 2- I was successful with the cwl 2.0 version some time ago. Previous versions did not work for me.

Edit 3 Using an own/custom reference

Coming back to this in 2024 with new data, make sure you don't overlook the file make_rhap_reference_2.0.cwl in the bitbucket repo subfolder. If you have a GTF which is failing since all the transcripts and exons do not have the correct biotype, then turn GTF filtering off.

cwl-runner make_rhap_reference_2.0.cwl  --Genome_fasta chromosomes.fasta --Gtf some.gtf --Filtering_off

Edit 4. TMPDIR

I had problems with TMPDIR since it was on a partition which was not mountable by docker, and got a permission denied when the pipeline was extracting the refererence.tar.gz, even though the fasta and gtf were now ok. Setting TMPDIR to a large local directory (non-nfs etc) actually works.

TMPDIR set up on some large partition since the pipeline tends to fill in /tmp

export TMPDIR=/path/to/your/large/partition
bd rhapsody single-cell • 2.7k views
ADD COMMENT
5
Entering edit mode
14 months ago
Darked89 4.7k

I have managed to run Rhapsody on a computing node (HPC Slurm cluster). Since it was not a head node, it has no network connection to the outside world, and no Docker.

You will need:

  • javascript/node on the PATH, no clue why sometimes it did not work with the container. Maybe toil was more picky
  • apptainer (make sure you got unsquashfs. I got mine using conda and squashfs-tools
  • in $CWL_SINGULARITY_CACHE : bdgenomics_rhapsody:2.0.sif maybe also node_alpine.sif. You may sftp these from your workstation if the HPC headnode somehow also has restricted connections to the outside .
  • cwltools
  • TMPDIR set up on some large partition since the pipeline tends to fill in /tmp
  • reference file from:

http://bd-rhapsody-public.s3-website-us-east-1.amazonaws.com/Rhapsody-WTA/Pipeline-version2.x_WTA_references/

my script:

#!/usr/bin/bash

#SBATCH --job-name=rhapL1
#SBATCH --nodes=1
#SBATCH --time=10:00:00
#SBATCH --cpus-per-task=32
#SBATCH --mem=64G
#SBATCH --partition=foobar

export CWL_SINGULARITY_CACHE=/path/to/singu_cache/
export PATH=/path/to/soft/progs/node_current/bin:$PATH

cwltool --singularity \
--outdir /some/output/path/rhapsody_lane1  \
--debug \
--cachedir /some/output/path/rhapsody_lane1/cache \
rhapsody_pipeline_2.0.cwl pipeline_inputs_lane1.yml

The above is quite brain dead attempt at executing the whole pipeline sequentially on a one computing node. But at least it did work.

I also was experimenting with toil, but so far without a success with Rhapsody.

Last but not least: if you are curious poke around inside of the container. Compiled QualCLAlign program I guess executing STAR etc.

edit reference file link

ADD COMMENT
2
Entering edit mode

Hello!

Thank you for this great solution. I used the same approach to run the BD 2.2 version (multiome). In my case, there was no need to extract the PATH to node_alpine, I just loaded the module from the cluster. In case it is useful I pulled the image using singularity.

singularity pull docker://bdgenomics/rhapsody:2.2

Also I use a conda env for the cwltools.

Cheers

ADD REPLY
0
Entering edit mode

Glad it did work for you with the updated version of the Rhapsody ;).

It would be great if at some point we can get it working in parallel using toil.

ADD REPLY
2
Entering edit mode
20 months ago
LChart 4.7k

Yes, BD Rhapsody can be run locally. In fact it uses a docker so the setup isn't really all that bad: https://www.bdbiosciences.com/content/dam/bdb/marketing-documents/BD_Single_Cell_Multiomics_Analysis_Setup_User_Guide.pdf

If you are deploying somewhere without docker, you will run into trouble as one particular step of the pipeline calls docker from within docker, so no amount of reverse-engineering the environment can get around it.

It should be pointed out that the FAQ you link does not say the software must be used via the seven bridges platform; only that it cannot be downloaded from seven bridges itself:

You cannot download the software from Seven Bridges. Please follow the BD Single-Cell Multiomics Analysis Setup User Guide on installing the pipeline for local use.

ADD COMMENT
2
Entering edit mode
3 days ago
ATpoint 86k

2025 update using pipeline version v2.2.1 on a HPC SLURM cluster with Apptainer available as a module. That module contains Node.js.

Most critical: Do not install cwl-runner via conda or anything but strictly use pip. Using anything but pip resulted in obscure pipeline errors I could not debug. Might be some sort of version and/or dependency clash, I did not find out.

mamba create --name bd_rhapsody python
mamba activate bd_rhapsody
pip install cwlref-runner
module load Apptainer

As said, our Apptainer (Singularity) module seems to load Node.js already. If not in your case, maybe load via conda/mamba.

From there I simply executed the commands as instructed in the manual, for example for an AbSeq-only experiment:

# Typical SLURM headers ...
module load Apptainer
mamba activate bd_rhapsody
cwl-runner --cachedir $(pwd)/cache --outdir $(pwd)/out --singularity rhapsody_pipeline_2.2.1.cwl pipeline_inputs.yml

Be sure to set a cachedir, if not the pipeline spams (and maybe overloads?) /tmp. Both cache and output directories should be on partitions with lots of free space, on HPC that typically would be scratch partition.

where the content of pipeline_inputs.yml is:

#!/usr/bin/env cwl-runner
cwl:tool: rhapsody

Reads:

 - class: File
   location: "/scratch/.../foo_S3_L005_R1_001.fastq.gz"

 - class: File
   location: "/scratch/.../foo_S3_L005_R2_001.fastq.gz"

AbSeq_Reference:
 - class: File
   location: "/scratch/.../abseq.fa"

Cell_Calling_Data: AbSeq
Putative_Cell_Call: AbSeq

That worked quite fine "on my machine". No need to manually pull any containers.

ADD COMMENT

Login before adding your answer.

Traffic: 3365 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6