I’m not sure how familiar folks are with the Bioconductor ">Biobase" datasets but one of the free datasets is called ALL (Acute lymphoblastic leukemia). My goal is to calculate and visualize some basic properties of gene expression levels from the ALL dataset. I’m getting stuck on how to write a line of code that's specifically for patients, and another just for genes. I understand that rows=genes and columns=patients, but I do not have the syntax down to specify patients and genes. For example, I have tried:
> hist(exprs(ALL))
And
> median(exprs(ALL))
[1] 5.469092
I’m not happy with the results and I know I’m doing something wrong, for example…
> hist(exprs(ALL))
…will not calculate the average gene expression levels for each gene. and…
> median(exprs(ALL))
…will not find the average gene expression per gene. Any guidance is super appreciated!
I know from...
...that my data has 12625 rows (genes) and 128 columns (patients). When I run the following:
I'm expecting a single middle number for the entire dataset but instead I'm getting a super long list (below is a sample):
For example,
Also, for a histogram I know that:
...works, however,
for rows (genes), and
for columns (patients) does not work, why?
Also, many thanks for your reply post! Very helpful!!
First, you are misunderstanding
apply()
Will calculate the median for each gene (row), so you will have a median value for each gene, that is, 12625 median values. And:
Will calculate the median for each patient (column), so you will have a median value for each gene, that is, 128 median values
For the histograms, you need to use
apply()
with the hist callhist()
as well. To create an histogram of median patient values:Many thanks - this was very helpful!
EDIT: The problem was with my eyesight, not your command. Apologies!
Are you even reading the answers? I see noapply
in your commands.You wanted per-gene medians, right? How would that be one number? You'll need to use the
apply()
within histogram.apply()
does not change the dataset it works on. Most commands in most languages operate on read-only copies of their input arguments, so changing an input argument would be unexpected behavior.I think you are correct, thanks Ram.
So what am I returning with the following command?
the
median
of all m-by-n numbers. Read Friederike's examples carefully.What you're getting is called a named list. You can use
unname
or any relevant tidyverse converter method (tibble::rownames_to_column
) comes to mind to get the names as a separate column/row.I strongly doubt that
hist(exprs(ALL, 1))
works. You gotta be a bit more careful here. Commas matter, so do brackets.Also notice how your two example commands (if the comma is placed elsewhere) return two very different things, which, unsurprisingly, leads
hist
to do very different things with them.Many thanks Friederike - your examples were very helpful.
you're welcome. good luck with your analyses!