Entering edit mode
6.3 years ago
Za
▴
140
Sorry, I have just read these to papers but likely there is not any git or R package for methods they employed and they just described in methods the steps. So how I could run the methods used in these two papers on my own data? I know this is out of questioning but most of papers I found address somewhere for codes they used.
http://science.sciencemag.org/content/early/2018/04/25/science.aar5780
http://science.sciencemag.org/content/early/2018/04/25/science.aar4362
Email the authors if you can't find any information in paper/supplementary materials (which should be on the Science site).
Thank you, I did the same but did not response yet
Sorry,
In one of these papers say that
For each two adjacent time points, we embedded all cells from the two time-points in the PCA space learned from the second time point only, keeping non-trivial PCs as defined above. This embedding causes cell-cell distances to reflect gene expression variation between tissues, as opposed to global changes over time.
Do they mean I should perform principal component analysis by cells from two neighbour time points based on the genes underlying load of PCs in later time point afterward selecting significant PCs?
This means that the authors process time points in pair, doing PCA on the second time point and project the data for the first time point onto this space. This way, only consecutive time points are comparable and they do this as a heuristic to reconcile the clusters from beginning to end of their time series. While they justify it as reflecting "gene expression variation between tissues, as opposed to global changes over time", I think a more principled, global approach using tensor factorization is worth exploring because the components would be the same for all time points and these components can be interpreted as clusters and may reflect some of the biology.
Sorry, I did not understand this part
doing PCA on the second time point and project the data for the first time point onto this space
May you please explain that more
This means applying PCA to data of the second time point then take the data for the first time point and project them into the PCA space of the second time point.
In R, this would look something like:
Thank you, really without your explanation whatever I googled I did not understand this part
But when I merged data from t1 and t2
Now how I can extract non-trivial PCs from this data?
The authors define non-trivial PCs by comparison to a randomized data set. This is illustrated in figure 5e of reference 1 of the supplementary material. This is not well explained but as I understand it, you compare the distribution of eigenvalues of the data to the distribution of eigenvalues from a randomized version of the data and consider as non-trivial the PCs corresponding to the eigenvalues falling outside the distribution of the randomized data.
Thanks a lot, their methodology for me is impossible for understanding and implementation though :(
They replied my email and likely they will share some codes but initially they mentioned the code would not be useful as I would not be able to employ that :(
This is the link of used codes in their paper but impossible for me to understand that :(
https://www.dropbox.com/sh/zn9b5xgssmkhnqa/AACJucOyiLcs-1WOmwerQyf3a/Subroutines?dl=0&subfolder_nav_tracking=1