LINCS L1000 dataset column names
2
2
Entering edit mode
8.2 years ago
wir ▴ 50

I'm working with a LINCS L1000 dataset that gives the GE of a cell line before and after perturbation by a small molecule. I am using Level 4 data. After loading the .gct file into matlab, I get a matrix of 22268-by-40172 as well as a vector of column_ids and a vector of row_ids.

Using the row ids and the gene metadata txt file included in the download, I know that each row represents a gene.

I can't figure out what a column represents. Obviously, each columns is a single experiment but I can't understand what each id means.

For example, here is a column id "LJP001_BT20_24H_X1_B2_DUO52HI53LO:A03".

So far, I know that "LJP001" refers to LINCS Joint Project and "BT20" refers to the specific cell line. Somewhere, it must contain information about the small molecule used as a pertubagen but I don't know how to interpret this. Any help would be greatly appreciated!

LINCS L1000 • 5.0k views
ADD COMMENT
0
Entering edit mode

How do you get the perturbagen from the perturbagen group?

ADD REPLY
0
Entering edit mode

I have a relevant question. If you noticed, in the list of gene symbols first the landmark genes are presented. Second are the -666 genes which means the unavailable predicted genes. Third are the predicted genes which are almost 19000 genes(22268 genes=978 landmark gene + 2000 unavailable genes (-666) + 19000 predicted genes). In the list of predicted gene symbols (column 1), several gene symbols are repetitive but with different expression values in the same experiment. How it is possible?

ADD REPLY
0
Entering edit mode

Please open a new question and mention this post in it. You're not really adding an answer, so why use the "Submit Answer" option?

ADD REPLY
0
Entering edit mode

How to download the data?

ADD REPLY
4
Entering edit mode
8.2 years ago
wir ▴ 50

To answer my own question.

The column ids for Level 3 and Level 4 data is basically the distil_id. The example I posted

LJP001_BT20_24H_X1_B2_DUO52HI53LO:A03

can be broken into

  • the perturbagen group "LJP001"
  • the cell line "BT20"
  • the brew prefix "LJP001_BT20_24H"
  • the plate index "X1_B2_DUO52HI53LO"
  • the well index "A03"
  • the distil_id "LJP001_BT20_24H_X1_B2_DUO52HI53LO_A03" (note the switch from ':' to '_')

It turns out that the distil_id doesn't contain enough information to identify the perturbagen used. To identify this, you need to use the LINCS api. Here is more information about using the LINCS api to query the metadata. I also used this Coursera video as a reference. Note that the example given in the question doesn't work with the API.

ADD COMMENT
0
Entering edit mode

How do you get the perturbagen from the perturbagen group?

ADD REPLY
0
Entering edit mode
2.4 years ago
Yep ▴ 20

In 2022, they seem to provide more information now. By querying the siginfo and compoundinfo csv files, we are able to see the perturbagen id, etc.

ADD COMMENT

Login before adding your answer.

Traffic: 1303 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6