I'm working with a LINCS L1000 dataset that gives the GE of a cell line before and after perturbation by a small molecule. I am using Level 4 data. After loading the .gct file into matlab, I get a matrix of 22268-by-40172 as well as a vector of column_ids and a vector of row_ids.
Using the row ids and the gene metadata txt file included in the download, I know that each row represents a gene.
I can't figure out what a column represents. Obviously, each columns is a single experiment but I can't understand what each id means.
For example, here is a column id "LJP001_BT20_24H_X1_B2_DUO52HI53LO:A03".
So far, I know that "LJP001" refers to LINCS Joint Project and "BT20" refers to the specific cell line. Somewhere, it must contain information about the small molecule used as a pertubagen but I don't know how to interpret this. Any help would be greatly appreciated!
How do you get the perturbagen from the perturbagen group?
I have a relevant question. If you noticed, in the list of gene symbols first the landmark genes are presented. Second are the -666 genes which means the unavailable predicted genes. Third are the predicted genes which are almost 19000 genes(22268 genes=978 landmark gene + 2000 unavailable genes (-666) + 19000 predicted genes). In the list of predicted gene symbols (column 1), several gene symbols are repetitive but with different expression values in the same experiment. How it is possible?
Please open a new question and mention this post in it. You're not really adding an answer, so why use the "Submit Answer" option?
How to download the data?