Hello,
I have a data in the form of a dataframe that I downloaded from GTEx Portal. It contains RNASeq gene read counts used in their study.
> dim(expr.df)
[1] 55993 2921
> expr.df[1:10,1:2]
GTEX-N7MS-0007-SM-2D7W1 GTEX-N7MS-0008-SM-4E3JI
ENSG00000223972 0 0
ENSG00000227232 158 166
ENSG00000243485 0 0
ENSG00000237613 0 0
ENSG00000268020 0 0
ENSG00000240361 0 0
ENSG00000186092 0 0
ENSG00000238009 17 2
ENSG00000233750 35 0
ENSG00000237683 8489 34
I checked the sample information file and there is no information about the conditions. I want to normalize the raw counts. For that, I want to use DESeq's getVarianceStabilizedData()
function. However this function takes as an input a CountDataSet
object. So when I try to make a CountDataSet
object using this:
> cds <- newCountDataSet(countData = as.matrix(expr.df))
Error in is(conditions, "matrix") :
argument "conditions" is missing, with no default
It spits out an error asking me to specify the conditions. However, there are no conditions in this dataset. How can I normalize these values?
I think you're getting into variance there. You just want to normalize for the number of reads sequenced, right?
I believe DESeq still uses median normalization.
I don't know the commands in DESeq but if you want to do it by hand here is the basic process:
Do a scatter plot of your two condition, i.e. GTEX-N7MS-0007-SM-2D7W1 on the X axis and GTEX-N7MS-0008-SM-4E3JI on the Y axis.
a=get the median count value in GTEX-N7MS-0007-SM-2D7W1
b=get the median count value in GTEX-N7MS-0008-SM-4E3JI
your slope, and median normalization factor is b/a
plot the line through your data (y intercept =0) and see if it fits.
Sometimes it works, sometimes you need to use something different.