Question

Tools to generate spliced/unspliced counts

1

Entering edit mode

3.8 years ago

PianoEntropy ▴ 70

I have 10x sc RNA-seq data for which I want to do an RNA velocity analysis using scvelo and CellRank. However, the bottleneck for me is generating the spliced/unspliced counts from the aligned data (BAM files or 10x output). I've found two tools that are able to do this:

(1) Velocyto. The original tool for calculating spliced/unspliced counts. The CLI tool is simple to use and has an option velocyto run10x which should work directly on 10x data. One potential issue with velocyto is that is seems to be no longer updated. The last commits to the git repository are from ~3 years ago (as of 07/2021).

(2) STARsolo. The updated version of STAR has a built-in tool to calculate spliced/unspliced ratios. Basically adding --soloFeatures Gene Velocyto as option should produce this result. The issue is that the output has a different form (spliced.mtx and unspliced.mtx files) from the .loom files required for scvelo.

Hence neither of the tools is working for me, even after quite some troubleshooting (a few of the error files attached).

(1) How does one convert the output of STARsolo into a form that can be read in python by scvelo (e.g. a loom file like velocyto does)?

(2) Are there are other tools out there, that can generate splicing data for sc-RNAseq data from 10x, which are directly compatible with scvelo?

=====================Error logs============================================

Velocyto run 10x

I created a conda environment with velocyto and all its dependencies and simply ran velocyto run10x -vvv --dtype uint32 $DATA10x_PATH $GTF_PATH.

OMP: Info #270: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead. 2021-07-06 18:17:21,016 - ERROR - This is an older version of cellranger, cannot check if the output are ready, make sure of this yourself 2021-07-06 18:17:21,016 - ERROR - Can not locate the barcodes.tsv file! Traceback (most recent call last): File "/Users/*/opt/miniconda3/envs2/velocyto/bin/velocyto", line 11, in <module> sys.exit(cli()) File "/Users/*/opt/miniconda3/envs2/velocyto/lib/python3.9/site-packages/click/core.py", line 1137, in __call__ return self.main(*args, **kwargs) File "/Users/*/opt/miniconda3/envs2/velocyto/lib/python3.9/site-packages/click/core.py", line 1062, in main rv = self.invoke(ctx) File "/Users/*/opt/miniconda3/envs2/velocyto/lib/python3.9/site-packages/click/core.py", line 1668, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/Users/*/opt/miniconda3/envs2/velocyto/lib/python3.9/site-packages/click/core.py", line 1404, in invoke return ctx.invoke(self.callback, **ctx.params) File "/Users/*/opt/miniconda3/envs2/velocyto/lib/python3.9/site-packages/click/core.py", line 763, in invoke return __callback(*args, **kwargs) File "/Users/*/opt/miniconda3/envs2/velocyto/lib/python3.9/site-packages/velocyto/commands/run10x.py", line 91, in run10x bcfile = bcmatches[0] IndexError: list index out of range

The 10x output (barcodes.tsv.gz, features.tsv.gz, matrix.mtx.gz) are in subfolder count/sample_feature_bc_matrix, rather than outs/filtered_gene_bc_matrices as required by velocyto. However, renaming these folders does not solve the problem.

==============Velocyto run=========

I also tried running velocyto directly on the BAM files, similar to velocyto run -b filtered_barcodes.tsv -o output_path -m repeat_msk_srt.gtf possorted_genome_bam.bam mm10_annotation.gtf. However, here I get a Permission error with samtools.

==============STARsolo ===========

This gets more complicated with more options, but I was able to run

STAR --genomeDir $GENOME_PATH \ --runThreadN 12 \ --readFilesIn $FASTQ_FILES_1 $FASTQ_FILES_2 \ --outFileNamePrefix $RESULTS_PATH \ --outSAMtype BAM SortedByCoordinate \ --outSAMattributes Standard \ --outFilterType BySJout \ --outFilterMultimapNmax 1 \ --outFilterMismatchNmax 1 \ --outFilterIntronMotifs RemoveNoncanonical \ --outSJfilterOverhangMin 30 20 20 20 \ --outSJfilterCountUniqueMin 3 2 2 2 \ --clipAdapterType CellRanger4 --outFilterScoreMin 30 \ --soloType CB_UMI_Simple \ --soloFeatures Gene Velocyto \ --soloMultiMappers Unique \ --soloCBwhitelist $CB_WHITE_LIST \ --soloUMIlen 12 \ --soloCBmatchWLtype 1MM --soloUMIfiltering MultiGeneUMI_All --soloUMIdedup 1MM_CR \ --soloCellFilter EmptyDrops_CR 3000 0.99 10 45000 90000 5000 0.01 20000 0.01 10000

without problems and generate most of the alignment required. It generated a separate folder for the velocyto files, with files ambiguous.mtx barcodes.tsv features.tsv spliced.mtx unspliced.mtx. However, as mentioned I had problems working with these files in scvelo.

velocyto single-cell splicing RNA-seq scvelo • 8.4k views

ADD COMMENT • link 3.8 years ago by PianoEntropy ▴ 70

score 3 · Accepted Answer · 2021-07-06

3

Entering edit mode

3.8 years ago

ATpoint 87k

If you have the fastq files available you can use Alevin as described in their tutorial: https://combine-lab.github.io/alevin-tutorial/2020/alevin-velocity/

ADD COMMENT • link 3.8 years ago by ATpoint 87k

2

Entering edit mode

Or follow the alevin-fry tutorial here : https://combine-lab.github.io/alevin-fry-tutorials/2021/alevin-fry-velocity/. Would be happy to provide feedback if you run into any issues.

ADD REPLY • link 3.8 years ago by Rob 7.1k