I'm trying to run MDSeq, which appears to require a plain old data frame - counts and cells, and with labels for conditions (in my case, it'll be clusters found from Seurat's clustering).
I'm completely lost as to what function, if any, could strip all data/metadata except for the 3 above, and then convert it into a dataframe?
Alternatively, I am using the below script to split it into dataframes for each cluster, could/should I combine these and retain/add cluster labels? Seems less elegant than the other approach.
cluster_list <- levels(data_transcripts_seurat@active.ident)
for(cluster in cluster_list) {
name <- paste("data_cluster_", cluster, sep="")
clustersubset <- subset(data_transcripts_seurat, idents=cluster, slot="counts")
assign(name, as.data.frame(clustersubset@assays$SCT@counts))
}
Why not start with
data_transcripts_seurat@assays$SCT@counts
?Something like:
If you need the cells to be rows, you can use the
t()
function:Thanks! That's a much better way to do it. Could you please also provide guidance on how the cluster assignment can also be transferred (correctly) to this dataframe?
what does
data_transcripts_seurat@active.ident
get you? Shouldn't those be the cluster assignments per cell? [Disclaimer: I do not use Seurat, so I'm basing this off your initial code]Hi, yes I think it is! I did not realise this before (also very new to Seurat, and transcriptomics in general). How can I add this to
counts.df
? Is it with theattributes
function?Also, should I be concerned about the order of cells being different in
counts.df
anddata_transcripts_seurat@active.ident
? Are there ways to make sure they are matched?