Question

In need of help with my RNA velocity trajectory inference pipeline!

0

Entering edit mode

7 months ago

phhelou5 ▴ 10

Hello everyone,

I’ve been trying to perform RNA velocity analysis, and I’ve been having a few issues with my results. I was hoping someone with more experience could help me out maybe!

Basically I’m just trying to generate a figure to show through a non biased way a trajectory analysis showing the transition of stem cells into a subtype of cells.

The issue is that after doing my analysis, the trajectory goes inversely to what we expect and know is happening.

My phase portraits aren’t good, and I saw online that they demonstrate that the dataset may just be incompatible with such trajectory inference methods.

However, a bioinformatician from another university has done it on the same data set and the trajectory line are not only different from mine, but go in the direction we are expecting. I'm seeking to understand what I could've done differently/badly to make the result differ that much.

Here are my results:

dynamic trajectory inference scVelo

Cluster 3 are the stem cells, and 5 and 1 are the most differentiated population

Phase portraits

And this is the bioinformatician’s results:

bioinformaticians analysis

The dataset is the same, just the umap is mirrored

Although not drastically different we can at least observe the trajectory from the green subpopulation to the more differentiated blue and red populations.

All I know about their method is that they started with the fastq files

Does anyone have an idea of what I could be doing differently to generate such a different result ? And would anyone recognise what package the bioinformatician used to generate theirs ?

My current pipeline is as follows:

Cellranger-7.0.1 to generate the possorted bam file from the fasts

cellranger count --id SC28_counts \ --fastq FASTQ/ \ --sample SC28 \ --transcriptome /data/databases/refdata-gex-mm10-2020-A \ --include-introns true

Velocyto 0.17.17 to generate the unsliced and spliced counts loom file

velocyto run10x data/bam4velo/SC28 /data/imrb/databases/refdata-gex-mm10-2020-A/genes/genes.gtf

(I didn’t add a mask.gtf file, would that have affected it much ?)

Scvelo and scanpy 0.3.1 to merge the all data together and generate the RNA velocity graph
- I preprocessed according to the scVelo tutorial

Hopefully someone can help,

Thank you!!

RNAvelocity • 372 views

ADD COMMENT • link 7 months ago by phhelou5 ▴ 10

1

Entering edit mode

Just ask the bioinformatician.

There are thousands of ways to process your data: reference, alignment, quantification, filtering count matrices, R/python packages used, parameters set, etc.

Btw, this is one reason why I dislike these types of downstream analyses: choosing different things gets you different results even when your workflow is reasonable. It lacks the rigor for me, as a biologist, to be comfortable with it (see the papers critiquing the methods). Try velocyto on it, you get one direction; try scvelo on it, you get the opposite direction.

Also, what do you hope to actually gain by "fitting" (or rather, overfitting) your analysis to a result that you want?

ADD REPLY • link 7 months ago by dsull ★ 6.9k

0

Entering edit mode

We are trying to ask but we are working with a lab that is on another continent, and that lab is our link with the bioinformatics platform that did this analysis. So it's quite hard to actually discuss anything, as it takes a few days to get an answer and video calls are very hard to plan due to the time difference.

I also strongly dislike RNA velocity and don't see it as pertinent enough, and have already read the paper you linked, however I am not aware of any other fully non-biased trajectory inference methods (would you have anything to recommend ?).

And finally I am not trying to overfit the data to twist it to show what I want. I am simply assuming that the bioinformatician's analysis is the more correct one, and it also happens to display the expected behaviour of the cell trajectory, thus I am trying to see if anyone would have an idea of what I did wrong, or could do better, in order to reproduce it.

We know the studied cell population quite well. However in a paper under review the use of monocle3 for trajectory analysis was criticised since you have to choose the starting node yourself (which we did by choosing the one with the highest expression of stemness markers but evidently that wasn't enough justification)

No bioinformatician in my lab does RNA velocity, which is why I'm here trying to get advice from anywhere I can :).

ADD REPLY • link 7 months ago by phhelou5 ▴ 10