Question

transcriptomic data for molecular subtyping

0

Entering edit mode

13 months ago

mavy ▴ 10

Hello All ,

I am a computer scientist having little knowledge of genomic data . I want to use Artificial intelligence tools like machine learning to classify a particular cancer's molecular subtype.

I planned to use the RNAseq data but I am unsure what kind of data would work best for this problem , like should i be a gene expression data having genes as rows and samples as column or it should have samples as rows and from where can I get it ?

I would appreciate if anyone could guide me regarding this or provide me with any leads.

Thank you in advance

molecular-subtyping • 730 views

ADD COMMENT • link updated 13 months ago by mark.ziemann ★ 2.0k • written 13 months ago by mavy ▴ 10

score 1 · Answer 1 · 2024-02-20

1

Entering edit mode

13 months ago

mark.ziemann ★ 2.0k

You can obtain cancer profiling data from TCGA (https://portal.gdc.cancer.gov/). This includes genome sequences, gene expression, chromatin modifications and others. Typically the genes are rows and the columns are samples. When dealing with cancer data, be aware that tumor samples may be contaminated with some healthy tissue, which complicates analysis a bit.

ADD COMMENT • link 13 months ago by mark.ziemann ★ 2.0k

0

Entering edit mode

I wouldn't necessarily see the "normal samples" as contaminants. They could serve as more or less like "internal posivitive controls" for unsupervised learning approaches with the assumption that normal tissue samples are more similar to each other than tumor samples. Having said that, I am sure there will be cases where the biology of some normal samples will be closer to tumor samples rather than other normals.

ADD REPLY • link 13 months ago by Haci ▴ 730

1

Entering edit mode

You may have misunderstood. When biopsies of cancers are made, sometimes some non-malignant tissue is also harvested. This causes the "cancer" samples to in effect be a mixture of normal and cancer cells. The proportion of normal and cancer material in a sample varies a lot between biopsies and this can be an unwanted cause of noise in a dataset, which can be addressed by deconvolution techniques or by statistical correction.

ADD REPLY • link 13 months ago by mark.ziemann ★ 2.0k