Hello everyone,
I have RNA seq data of human cardiomyocyte samples collected at 5 different time points of the development of the cells (i.e. Day0, Day2, Day5, Day15, Day 30). The model is hence a directed differentiation system. I am using a file with normalized RPM counts for each transcript ID from a previous transcriptome quantification step(with Cufflinks). I eventually plan on "grepping" these transcript IDs to the corresponding Gene_IDs. What I essentially have is a matrix with cuff.IDs and gene expression values for 5 columns representing the time points. I want to essentially build a gene regulatory network that encapsulates the differentiation process in our cardiomyocyte samples. I want to use genes that are constantly differentially expressed throughout the differentiation time-points. I was thinking about approaching this by running a differentially expressed gene analysis of each time point in development against Day0, sort of using Day 0 as the control. I would then select those genes that remain differentially expressed in all comparisons Day0-2, Day0-5, Day0-15, Day0-30. My intention was to perhaps rerun DESEQ2 in R in this manner. However, when I mentioned this idea to my PI, I was told that I could instead approach the matter by calculating the covariance among the samples and then ranking the genes and selecting the top few genes using EXCEL. I have no idea how to approach this using EXCEL. I am completely inexperienced in bioinformatics, programming, statistics and I barely used a PC until 5 months ago. I would appreciate it if I could get a step by step tutorial to how approach my issue using EXCEL for my specific project. I am aware there are many tutorials out there but none are clear and are rather causing more confusion for me. For example when I calculate the covariance among two lists of genes it results in only one value. What can I do with this covariance value in excel, in order to successfully rank the genes by covariance?
My supervisor instructed me to use R to get these results. However, I am terrible with R. I cannot even figure out which function to use to read the file. read.table is giving some issues. This is the command line that my supervisor advised to use to obtain variance from list:
topVarGenes <- head(order(rowVars(data[,2:6]),decreasing=TRUE),15)
gene_lists <- cbind(data[topVarGenes,], rowVars(data[topVarGenes,2:6]))
write.table(gene_lists,file='topVarGenes.txt',quote=FALSE,sep="\t")
###So the rowvars are calculating the covariance and order and ranking them.
The above is just not working. I think it might have to do with how I loaded the data, but I am so inexperienced in R, I am not certain what the issue is. I am speculating maybe it should be data.frame. It would be much obliged if I could get a step by step R command line to get the results I need.
Also, if I wanted to instead run a coVariance against Day 0 for all samples how would I modify the command line?
I know I have asked a lot of questions and I am very grateful in advance to whoever takes the time to respond.
Please can you add sample data and the output.
Yes I can, which output are you referring to?