Question

Dataset kind of data NCBI

0

Entering edit mode

6.1 years ago

lvitale • 0

I'm a Computer Science student and I'd like to make an application in bioinformatics. I'm looking for dataset with gene expression and I found two interesting dataset on NCBI. But I don't understand what kind of annotation is used.

The first one is: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE73072

I read that the experiment is composed by 12,023 genes. But I don't understand which annotation is used. The first "genes" are: "10000_at" "10001_at" "10002_at" "10003_at" "10004_at" "10005_at" "10006_at" "10007_at" "10009_at" "1000_at". My question. is this annotation geneID? But why there is at at the end? How I can transform this kind of annotation in gene symbol?

The second one is: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE99039

In this case the annotation is like the first dataset: "1007_s_at" "1053_at" "117_at" "121_at" "1255_g_at" "1294_at" "1316_at" "1320_at"
but the number of "genes" available is 54675 but I knew that the number of human protein-coding genes estimated was 19,000-20,000. Can I transform this kind of data in gene symbol?

Thank you so much

disease dataset • 1.5k views

ADD COMMENT • link updated 6.1 years ago by bharata1803 ▴ 560 • written 6.1 years ago by lvitale • 0

score 3 · Accepted Answer · 2018-11-15

3

Entering edit mode

6.1 years ago

bharata1803 ▴ 560

The annotation come from the microarray platform. You should check the platform information first every time you download public data because it may come from different platforms. I believe that there is an R library which can decode the gene annotation for each microarray platform but you can also download the table manually from the platform information from NCBI website.

For the first dataset, the platform is this: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GPL14604

For the second dataset: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GPL570