Question

Clustering for Single-cell RNA-seq Data

0

Entering edit mode

5.4 years ago

aloke205 ▴ 40

Dear members,

I have a csv file from Single-cell RNA-seq experiment with three column: unique Cell-IDs (First column), Cluster-IDs (Second column) and CloneIDs (Third)

I need to generate heat-map using this csv file in R to detect if cells within or across clusters are clonally related.

Please help. Thanks in advance

clustering R single-cell • 3.2k views

ADD COMMENT • link 5.3 years ago by aloke205 ▴ 40

3

Entering edit mode

Read the CSV into a data.frame
Convert the data.frame to a matrix
Use a popular heatmap package such as ComplexHeatmap or pheatmap or heatmap.2

ADD REPLY • link 5.4 years ago by Ram 44k

0

Entering edit mode

have you tried anything yet? what exactly is the roadblock you're encountering?

ADD REPLY • link 5.4 years ago by Friederike 9.0k

0

Entering edit mode

I have been able to generate heatmap (available at the link below) using ComplexHeatmap as guided by @RamRS

As I am new to this analysis, I am stuck how to detect if cells within or across clusters are clonally related. Hence asking for help :(

ADD REPLY • link 5.4 years ago by aloke205 ▴ 40

0

Entering edit mode

Please read some tutorials, both on scRNA-seq and clustering, e.g. the Seurat tutorial https://satijalab.org/seurat/vignettes.html to get the basic ideas. scRNA-seq is not trivial to analyze given the noisy nature of the data and the zero inflation. If you are no expert, stick 100% to published tutorials, do not add custom approaches. Please go through the literature, read tutorials and blogs and try to follow them. Use established tools for everything, Seurat is a good starting point.

For clustering (depending on the goal) you typically transform the counts to log-scale and optionally to the Z-score. Logging will compress the range and reduce the influence of large counts on the scale which is exactly the problem in your heatmap. Few single rows dominate the heatmap and render the rest of the values unreadable.

ADD REPLY • link 5.4 years ago by ATpoint 85k

0

Entering edit mode

Does that mean that you would like to see if cells within the same cluster are more likely to share the clone ID than cells across different clusters?

ADD REPLY • link 5.4 years ago by Friederike 9.0k

0

Entering edit mode

Yes.. I am looking for the same. Could you please help me?

ADD REPLY • link 5.4 years ago by aloke205 ▴ 40

0

Entering edit mode

I don't fully get where the actual problem is for you. Is it the usage of R that you find difficult? (You could open that file in Excel and let go of R if that was the problem) You could, for exampe, simply sort by Clone ID -- presumably, there aren't that many cells that share the same clone ID anyway. Then you could calculate the fraction of cells from each cluster that happen to share that clone ID.

ADD REPLY • link 5.4 years ago by Friederike 9.0k

0

Entering edit mode

Thanks a lot for your answer. This is very useful for me.

My main problem is with understanding how to quantify if cells within or across clusters are clonally related using the above csv file. Additionally, I am new to this type of analysis and my professor asked me to quantify the result in one graph or heatmap using R within a week.

As I am still learning, I am afraid I will be able to complete this problem within a week by myself. Thus, I am seeking help :(

ADD REPLY • link 5.4 years ago by aloke205 ▴ 40

2

Entering edit mode

If you're new to R, I suggest you first try to forget about the pressure of having to learn R for this and think about the problem at hand.

If "clonally related" means "cells that have the same Clone ID", first have a look whether there are, in fact, any cells that indeed share the same clone ID. If every single cell has a distinct clone ID, go back to your professor and ask how they would go about identifying related cells based on those three columns that you have.

If you indeed find a group of cells that share the same clone ID, count how many times each cluster ID is present within that group. Then count how many times each cluster is present in the full data set and calculate the fractions for each cluster: [cells_with_same_cloneID and cluster X]/ [all cells for cluster X]

That could be a starting point from which to discuss further with your professor. And if you want a visual to aid that you could do a bar chart or dot plot of the fractions.

ADD REPLY • link updated 5.3 years ago by Ram 44k • written 5.3 years ago by Friederike 9.0k

0

Entering edit mode

Thank you for answering in detail.

Yes, in our analysis "clonally related" means "cells that have the same Clone ID".

After reading your answer, I cross-examined my data-set and find out that there are numerous group of cells that share the same clone ID.

I will try to solve the problem as suggested by you. Thank you for your help. It means a lot to me.

ADD REPLY • link 5.3 years ago by aloke205 ▴ 40

0

Entering edit mode

Hii Friederike, I have been able to count how many times each cluster ID is present within each CloneID group

e.g.

cloneID ClusterID count

 1           5            1

 1          26            1

 2           1             1

 2           2             2

 2           4             4

 2          12            1

 2          16             1

 2          19             1

Then count how many times each cluster is present in the full data set e.g.

ClusterID count

 1             18

 2             112

Now I am facing problem with calculating the fractions for each cluster, specifically [cells_with_same_cloneID and cluster X].

For instance, in the above result, cells_with_same_cloneID, e.g 1, belongs to cluster 5 and 26. But i am unable to understand how to estimate [cells_with_same_cloneID and cluster X]

Could you please guide me. I will be grateful for your kind act

ADD REPLY • link updated 5.3 years ago by GenoMax 147k • written 5.3 years ago by aloke205 ▴ 40

0

Entering edit mode

Hii Friederike, I tried to make fractions for each cluster. written below is the sample result

cl_ID   cs_ID   cs_count    cs_total_count  (cs_count/cs_total_count)
1       5       1           9               0.1111111 
1       26      1           8               0.125 
2       1       1           18              0.05555556 
2       2       2           112             0.01785714 
2       4       4           61              0.06557377 
2       12      1           9               0.1111111 
2       16      1           9               0.1111111 
2       19      1           12              0.08333333

Where

cl_ID = CloneID,
cs_ID = ClusteredID
cs_count = count of cluster in each clone
cs_total_count = Total number of cluster count in the full data set and
(cs_count/cs_total_count) = fractions for each cluster

Please suggests if I am moving in the right direction

ADD REPLY • link updated 5.3 years ago by Ram 44k • written 5.3 years ago by aloke205 ▴ 40

0

Entering edit mode

I cannot tell you whether you're moving into the right direction because that direction depends on your prof. The question I'd have for you at this point: do you understand what those numbers mean, i.e. are you learning anything about the population of cells you're looking at?

ADD REPLY • link 5.3 years ago by Friederike 9.0k

0

Entering edit mode

Thanks for your reply. Yes, now I am understanding what I am doing and I am really grateful to you because this was not possible without your guidance. I have one more question :)

Could you please tell if I have estimated fractions for each cluster in a correct way in the above example.

Though I think this may be correct, I have little doubt if both "[cells_with_same_cloneID and cluster X]" and cs_count/cs_total_count correspond to fractions for each cluster in above example

After your confirmation, I will generate bar chart and further discuss with my prof.

Thanks a lot for your help

ADD REPLY • link 5.3 years ago by aloke205 ▴ 40

1

Entering edit mode

It's seems perfect as suggested by Friederike.

However, I would like to add few line more:

In the end try to make matrix or dataframe, where each row will represent clone ID while each column will represent Cluster ID. Later utilised that data frame or matrix for generating heatmap or barchart as suggested by Friederike.

Again, I complete agree with what Friederike said above

If you're new to R, I suggest you first try to forget about the pressure of having to learn R for this and think about the problem at hand.

Hope this helps :)

ADD REPLY • link 5.3 years ago by Manoj ▴ 200

0

Entering edit mode

Thanks @mkgupta.bioinfo for the information :) I will try to generate matrix and heatmap as suggested by you.

ADD REPLY • link 5.3 years ago by aloke205 ▴ 40

0

Entering edit mode

Hii aloke, have you solved the problem.

ADD REPLY • link 4.6 years ago by heididunst ▴ 10

0

Entering edit mode

clonally related

Not sure if that term should be used in your data since clonality indicates a different kind of relationship (e.g. https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003665 ) at least in cancer.

What does the clone ID refer to here?

ADD REPLY • link 5.3 years ago by GenoMax 147k

0

Entering edit mode

Two different cells with same clone ID means these different two cells share the same lineage

ADD REPLY • link 5.3 years ago by aloke205 ▴ 40