Question

Finding repository of codes for these papers

1

Entering edit mode

6.3 years ago

Za ▴ 140

Sorry, I have just read these to papers but likely there is not any git or R package for methods they employed and they just described in methods the steps. So how I could run the methods used in these two papers on my own data? I know this is out of questioning but most of papers I found address somewhere for codes they used.

http://science.sciencemag.org/content/early/2018/04/25/science.aar5780

http://science.sciencemag.org/content/early/2018/04/25/science.aar4362

R RNA-Seq • 1.6k views

ADD COMMENT • link 6.3 years ago by Za ▴ 140

2

Entering edit mode

Email the authors if you can't find any information in paper/supplementary materials (which should be on the Science site).

ADD REPLY • link 6.3 years ago by GenoMax 147k

0

Entering edit mode

Thank you, I did the same but did not response yet

ADD REPLY • link 6.3 years ago by Za ▴ 140

0

Entering edit mode

Sorry,

In one of these papers say that

For each two adjacent time points, we embedded all cells from the two time-points in the PCA space learned from the second time point only, keeping non-trivial PCs as defined above. This embedding causes cell-cell distances to reflect gene expression variation between tissues, as opposed to global changes over time.

Do they mean I should perform principal component analysis by cells from two neighbour time points based on the genes underlying load of PCs in later time point afterward selecting significant PCs?

ADD REPLY • link 6.3 years ago by Za ▴ 140

2

Entering edit mode

This means that the authors process time points in pair, doing PCA on the second time point and project the data for the first time point onto this space. This way, only consecutive time points are comparable and they do this as a heuristic to reconcile the clusters from beginning to end of their time series. While they justify it as reflecting "gene expression variation between tissues, as opposed to global changes over time", I think a more principled, global approach using tensor factorization is worth exploring because the components would be the same for all time points and these components can be interpreted as clusters and may reflect some of the biology.

ADD REPLY • link 6.3 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

Sorry, I did not understand this part

doing PCA on the second time point and project the data for the first time point onto this space

May you please explain that more

ADD REPLY • link 6.3 years ago by Za ▴ 140

1

Entering edit mode

This means applying PCA to data of the second time point then take the data for the first time point and project them into the PCA space of the second time point.
In R, this would look something like:

pca <- prcomp(data.from.t2)
projected.data.from.t1 <- scale(data.from.t1, pca$center, pca$scale) %*% pca$rotation 
# or equivalently
projected.data.from.t1 <- predict(pca, data.from.t1)

ADD REPLY • link 6.3 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

Thank you, really without your explanation whatever I googled I did not understand this part

> pca <- prcomp(ss1)
> 
> projected.data.from.t1 <- scale(ss2, pca$center, pca$scale) %*% pca$rotation 
Error in scale.default(ss2, pca$center, pca$scale) : 
  length of 'center' must equal the number of columns of 'x'
> 
> dim(ss1)
[1] 2723  209
> dim(ss2)
[1] 2723  153
>

But when I merged data from t1 and t2

Now how I can extract non-trivial PCs from this data?

> a=cbind(ss1,ss2)
> projected.data.from.t1 <- predict(pca, a)

> str(projected.data.from.t1)
 num [1:2723, 1:209] 1447 1254 1009 1267 -2994 ...
 - attr(*, "dimnames")=List of 2
  ..$ : chr [1:2723] "DDB_G0267356" "DDB_G0267362" "DDB_G0267366" "DDB_G0267376" ...
  ..$ : chr [1:209] "PC1" "PC2" "PC3" "PC4" ...
>

ADD REPLY • link 6.3 years ago by Za ▴ 140

1

Entering edit mode

The authors define non-trivial PCs by comparison to a randomized data set. This is illustrated in figure 5e of reference 1 of the supplementary material. This is not well explained but as I understand it, you compare the distribution of eigenvalues of the data to the distribution of eigenvalues from a randomized version of the data and consider as non-trivial the PCs corresponding to the eigenvalues falling outside the distribution of the randomized data.

ADD REPLY • link 6.3 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

Thanks a lot, their methodology for me is impossible for understanding and implementation though :(

They replied my email and likely they will share some codes but initially they mentioned the code would not be useful as I would not be able to employ that :(

ADD REPLY • link 6.3 years ago by Za ▴ 140

0

Entering edit mode

This is the link of used codes in their paper but impossible for me to understand that :(

https://www.dropbox.com/sh/zn9b5xgssmkhnqa/AACJucOyiLcs-1WOmwerQyf3a/Subroutines?dl=0&subfolder_nav_tracking=1

ADD REPLY • link 6.2 years ago by Za ▴ 140