Hello,
I have normalized gene expression data for 1095 patients with breast cancer part of the data as follows
patient 1 patient 2 patient3 patient4 patient5 patient6
AASS 80.8588 135.3218 158.7152 20.7441 187.836 126.2016
AATF 2012.3344 990.1661 727.4445 1498.5344 1329.6371
AATK 179.534 209.8278 35.4275 13.5558 99.1263 51.2694
ABAT 2086.3408 77.9285 600.3779 101.0147 1564.1801 1439.816
ABCA11P 79.5614 91.9899 152.9806 65.3844 36.7641 85.9551
ABCA12 43.8556 1.0531 37.317 10.823 82.3253 4.9298
ABCA13 21.9278 2.6327 0.9447 0.9019 0.672 0
can I using this data find differentially expressed genes, what package in R can I used to get the DE genes?
So you want to find differentially expressed genes, but you only have one group. Do you even know what differential expression means?
In a differential expression analysis you want to compare the expression of one group (e.g. patients) with another group (e.g. healthy controls). You want to find out which genes are differentially expressed (over- or underexpressed) in patients versus the control group.
Sorry but I'm learning how to analyze genomic data I have understood that the DE should be between 2 groups ( normal , tumor ) So I guess my question should be how to visualize the distribution of 20,000 genes in breast cancer patients
What is the aim of your analysis? What is the biological question you are trying to solve?
I'm trying to see if there association between gene expression and genotype snp data One assumption is normally distributed in a regression I'm trying to find a way to visualize the distribution af the gene expression data
That would be an eQTL analysis. You may want to have a look at this tutorial.
Is the package do the normal transform (log2(x+1)) or I have to do that before applying the eQTL since it uses regression and we have to validate the assumptions
Asking for diff expressed genes was probably not the right question here.
Are you interested in
classifying
these breast cancer samples into sub-types? Like in this paper?DEGs between individual patients or groups (cancer vs normal)?
what I want is to find the DE genes and try to visualize the distribution of DE genes so I think it should be between phenotypes but I'm not sure ,
what is the difference between DE genes between samples and between (cancer, normal)
I think you need to think about a good research question first, before you can get some good answers.
Well I'm new to biology and genetics I'm trying my best
I wasn't trying to dis you or put you down or anything. It's just that I see many scientists, new to bioinformatics, expecting to get answers without formulating a research question first. Bioinformatics is just like any other science, hence defining a research question first.
What is your goal with your data set? Know the differences between subsets of cancer? Or the difference between cancer and healthy? Etc.
are these fpkm values ?
They are TPM values