Question

Finding genes differently expressed in time series data

0

Entering edit mode

7.4 years ago

zizigolu ★ 4.4k

Hi,

I have Control cell line day1 to day 26 and Lead(pb) treatment day1 to day26. The experimental design aims to get the impact of Lead(pb) on developmental process .I must look if there are group of developmental genes that cluster differently in the lead(pb) experiment. For instance some genes are differentially expressed directly or others are coming in later time points. First I thought about WGCNA (figure) https://ibb.co/j9O5YS in which I think genes in significant modules in days 1-3 make sense for me. Then I thought about SOM clustering but clusters don't show any up or down regulation in day points

https://ibb.co/jEBW7n

Any help please?

rna-seq R gene • 2.6k views

ADD COMMENT • link 7.4 years ago by zizigolu ★ 4.4k

1

Entering edit mode

It depends on which definition of cluster you're interested in. For time series, I would compute a distance matrix based on dynamic time warping (see for example the R package dtw, be sure to read the FAQ down the page) and use this matrix as input for a clustering algorithm.

ADD REPLY • link 7.4 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

Thanks a lot, for example in picture A (cluster 28 with 103 genes), genes toward week 9 are being up regulated and in cluster 3 with 203 genes to the end of development down regulated.

https://ibb.co/h7N2yS

ADD REPLY • link 7.4 years ago by zizigolu ★ 4.4k

1

Entering edit mode

What I had in mind was for you to compute a distance matrix using dtw for each condition separately then use this for simultaneous clustering using for example a tensor factorization method as I wrote in a recent answer which you've seen.

ADD REPLY • link 7.4 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

Thanks a lot for sharing your idea. I have 52 samples (without replicates); control_day1 vs lead_day1 to control_day26 vs lead_day26. Then by dtw I must calculate 26 distance matrices for each control_day vs lead_day?????

Thanks again to help people to figure out

ADD REPLY • link 7.4 years ago by zizigolu ★ 4.4k

0

Entering edit mode

Sorry, this is my control and treatment matrices

> head(control[,1:4])
               MAST2     WWC2  PHYHIPL   R3HDM2
Control_D1  6.591024 5.695156 3.388652 5.756384
Control_D10 8.043454 5.365221 6.859768 6.936970
Control_D11 7.731590 4.868267 6.919972 6.931073
Control_D12 8.129948 5.105528 6.627016 7.090268
Control_D13 7.690863 4.729501 6.824746 6.904610
Control_D14 8.101723 5.334501 6.868990 7.115883
> 

> head(lead[,1:4])
              MAST2     WWC2  PHYHIPL   R3HDM2
Lead30_D1  6.418423 5.610699 3.734425 5.778046
Lead30_D10 7.918360 4.295191 6.559294 6.780952
Lead30_D11 7.807142 4.294722 6.599187 6.716040
Lead30_D12 7.856720 4.432136 6.572337 6.848483
Lead30_D13 7.827311 4.204738 6.607107 6.784094
Lead30_D14 7.848760 4.458451 6.581216 6.943003
>

library(dtw)

Control=dist(control,method="DTW");

Lead=dist(lead,method="DTW");

I googled too much but too confusing, is this distance matrices by dtw you suggested for tensor factorization ??

ADD REPLY • link 7.3 years ago by zizigolu ★ 4.4k

0

Entering edit mode

Don't you get an error ? dist() is just the standard distance function in R, it doesn't know anything about dtw. Start by reading the dtw doc. What you need to do is apply the dtw() function to each pair of items for which you want the dtw distance and extract the distance from the resulting data structure e.g.

  for(i ...) {
    for(j ...) {
      alignment <- dtw(item[i],item[j])
      distance[i, j] <- alignment$distance
    }
}

You may need to adjust to your needs, maybe use a symmetric version or the normalized distance. Check the doc.

ADD REPLY • link 7.3 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

Thank you, since creating this post I am googling for dtw but seems to complex. I did not get error and gives me a class 'dist' atomic

you mean I should put control and lead in this code?

for(i in control) {
    for(j in lead) {
      alignment <- dtw(item[i],item[j])
      distance[i, j] <- alignment$distance
    }
}

That gave me

Error in dtw(item[i], item[j]) : object 'item' not found

and this

for (i in row.names(control)) { 
  for (j in row.names(lead)) { 
    result[i,j] <- dtw( dist(x[,,i],x[,,j]), distance.only=T )$normalizedDistance }}

Gave

Error in x[, , j] : incorrect number of dimensions

ADD REPLY • link 7.3 years ago by zizigolu ★ 4.4k

0

Entering edit mode

item was just an example variable. Replace by what's appropriate in your case (control and lead maybe).

ADD REPLY • link 7.3 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

Sorry, Finally I was able to run your tutorial on my data.

I used two separate weighted adjacency matrices of control and treatment data (each 26 samples and 50 genes and I used genes as nodes).

my output matrices are not identical rather I can't understand how use the output to find which parts of the treatment-responses are specific to the treatment network?

https://ibb.co/f6DWen

 [[3]]
              [,1]       [,2]       [,3]       [,4]
    [1,] 0.5466689 -0.5357849 -0.4886675 -0.4788609
    [2,] 0.4533311 -0.4642151 -0.5113325 -0.5211391

How can I use this matrix to find responsive genes? Thank you for your time

ADD REPLY • link 7.3 years ago by zizigolu ★ 4.4k

0

Entering edit mode

Hi, I just found this post. I am dealing with the same problem and wondering whether I can get some help here. How did you solve it? Do you mind share the piece of code here? and did you manage to find responsive genes?

Thank you very much!

ADD REPLY • link 5.1 years ago by Pita • 0

0

Entering edit mode

Hello Honestly I gave up on tensor flow that time and used WGCNA R package to correlate time points with treatment. This package correlates blocks of gene with given trait

ADD REPLY • link 5.1 years ago by zizigolu ★ 4.4k

1

Entering edit mode

I gave up on tensor flow

You probably mean tensor factorization, tensorflow is a software library :)

ADD REPLY • link 5.1 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

The key questions are whether and how you want to make use of the time information. WGCNA summarizes all the information into one correlation matrix viewed as the adjacency matrix of a graph and proceeds under the assumption the graph has scale-free topology. This seems to be fine for most people but to me it discards information that could be interesting/useful. In any case, don't use an approach you don't understand.

ADD REPLY • link 5.1 years ago by Jean-Karim Heriche 27k