I have 10x sc RNA-seq data for which I want to do an RNA velocity analysis using scvelo and CellRank. However, the bottleneck for me is generating the spliced/unspliced counts from the aligned data (BAM files or 10x output). I've found two tools that are able to do this:
(1) Velocyto. The original tool for calculating spliced/unspliced counts. The CLI tool is simple to use and has an option velocyto run10x
which should work directly on 10x data. One potential issue with velocyto is that is seems to be no longer updated. The last commits to the git repository are from ~3 years ago (as of 07/2021).
(2) STARsolo. The updated version of STAR has a built-in tool to calculate spliced/unspliced ratios. Basically adding --soloFeatures Gene Velocyto
as option should produce this result. The issue is that the output has a different form (spliced.mtx and unspliced.mtx files) from the .loom files required for scvelo.
Hence neither of the tools is working for me, even after quite some troubleshooting (a few of the error files attached).
(1) How does one convert the output of STARsolo into a form that can be read in python by scvelo (e.g. a loom file like velocyto does)?
(2) Are there are other tools out there, that can generate splicing data for sc-RNAseq data from 10x, which are directly compatible with scvelo?
=====================Error logs============================================
Velocyto run 10x
I created a conda environment with velocyto and all its dependencies and simply ran
velocyto run10x -vvv --dtype uint32 $DATA10x_PATH $GTF_PATH
.
OMP: Info #270: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.
2021-07-06 18:17:21,016 - ERROR - This is an older version of cellranger, cannot check if the output are ready, make sure of this yourself
2021-07-06 18:17:21,016 - ERROR - Can not locate the barcodes.tsv file!
Traceback (most recent call last):
File "/Users/*/opt/miniconda3/envs2/velocyto/bin/velocyto", line 11, in <module>
sys.exit(cli())
File "/Users/*/opt/miniconda3/envs2/velocyto/lib/python3.9/site-packages/click/core.py", line 1137, in __call__
return self.main(*args, **kwargs)
File "/Users/*/opt/miniconda3/envs2/velocyto/lib/python3.9/site-packages/click/core.py", line 1062, in main
rv = self.invoke(ctx)
File "/Users/*/opt/miniconda3/envs2/velocyto/lib/python3.9/site-packages/click/core.py", line 1668, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/Users/*/opt/miniconda3/envs2/velocyto/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/Users/*/opt/miniconda3/envs2/velocyto/lib/python3.9/site-packages/click/core.py", line 763, in invoke
return __callback(*args, **kwargs)
File "/Users/*/opt/miniconda3/envs2/velocyto/lib/python3.9/site-packages/velocyto/commands/run10x.py", line 91, in run10x
bcfile = bcmatches[0]
IndexError: list index out of range
The 10x output (barcodes.tsv.gz, features.tsv.gz, matrix.mtx.gz) are in subfolder count/sample_feature_bc_matrix
, rather than outs/filtered_gene_bc_matrices
as required by velocyto. However, renaming these folders does not solve the problem.
==============Velocyto run=========
I also tried running velocyto directly on the BAM files, similar to velocyto run -b filtered_barcodes.tsv -o output_path -m repeat_msk_srt.gtf possorted_genome_bam.bam mm10_annotation.gtf
. However, here I get a Permission error with samtools
.
==============STARsolo ===========
This gets more complicated with more options, but I was able to run
STAR --genomeDir $GENOME_PATH \
--runThreadN 12 \
--readFilesIn $FASTQ_FILES_1 $FASTQ_FILES_2 \
--outFileNamePrefix $RESULTS_PATH \
--outSAMtype BAM SortedByCoordinate \
--outSAMattributes Standard \
--outFilterType BySJout \
--outFilterMultimapNmax 1 \
--outFilterMismatchNmax 1 \
--outFilterIntronMotifs RemoveNoncanonical \
--outSJfilterOverhangMin 30 20 20 20 \
--outSJfilterCountUniqueMin 3 2 2 2 \
--clipAdapterType CellRanger4 --outFilterScoreMin 30 \
--soloType CB_UMI_Simple \
--soloFeatures Gene Velocyto \
--soloMultiMappers Unique \
--soloCBwhitelist $CB_WHITE_LIST \
--soloUMIlen 12 \
--soloCBmatchWLtype 1MM --soloUMIfiltering MultiGeneUMI_All --soloUMIdedup 1MM_CR \
--soloCellFilter EmptyDrops_CR 3000 0.99 10 45000 90000 5000 0.01 20000 0.01 10000
without problems and generate most of the alignment required. It generated a separate folder for the velocyto files, with files ambiguous.mtx
barcodes.tsv
features.tsv
spliced.mtx
unspliced.mtx
. However, as mentioned I had problems working with these files in scvelo.
Or follow the alevin-fry tutorial here : https://combine-lab.github.io/alevin-fry-tutorials/2021/alevin-fry-velocity/. Would be happy to provide feedback if you run into any issues.