I was trying to identify modules from RNASeq count data with help of igraph tutorial. It didnt throw any error in Graph and tree object generation, and community identification with cluster_louvain() method. I almost used the same code shown in the tutorial.However, when I tried to output the community information, it throws error
commSummary <- data.frame(
mst.communities$names,
mst.communities$membership,
mst.communities$modularity)
colnames(commSummary) <- c("Gene", "Community", "Modularity")
Error in data.frame(mst.communities$names, mst.communities$membership, : arguments imply differing number of rows: 17965, 3
From stackoverflow solution for a similar problem, it looks like n_rows!= n_cols is the problem. But it is not the case here. Can anyone suggest a solution for this error?
dim(df))
[1] 17966 6
head(df)
8hc 9hc 12hc 8hs 9hs 12hs
02m156150 -4.275 9.528 -3.061 -3.17 0.134 9.659
02m256640 -4.025 10.005 -2.21 -2.656 1.142 9.397
02m146600 -3.154 8.553 0.444 -3.728 1.394 10.909
03m313660 -2.139 10.54 -1.644 -2.103 0.923 8.965
02m151678 -4.025 8.164 -2.323 -3.768 -1.609 7.888
If required, I can paste the whole code that I used.
Hi Kevin, thanks for the reply. I checked output of those 3. Length of mst.communities$names and mst.communities$membership are same.However I noticed that most of the genes in the mst.communities$membership is assigned to 1st module and remaining to 2nd and 3rd module. Output of mst.communities$modularity, I dont know if that is a length problem.
I tried the same code with a subset of 2000 genes (ie, only TF genes) an it worked well. Do you have any suggestion in this case?
If they are of the same length and are each a vector, then there should be no issue with that command. Can you confirm the output of:
?
Regarding the finding that the majority of genes are in the same module, that could reflect the structure of your data - is it non-uniform (heteroskedastic)? If your data is RNA-seq, the input should have been transformed, normalised data.
Yes, I used normalized data from RNASeq. Tried with both normalized count data as well as log2 format. But both gave same result. I checked the output and it looks like,
Oh, the modularity vector only has a length of 6. That may be some change introduced in igraph, but not sure...
You can definitely combine names and membership into a data-frame, though (they have equal length).
Another point to note is that you seem to have a vast number of modules / communities... it can be difficult to work with that amount.