Hi guys, I have two matrix with expression values for near 19,000 transcripts for two different groups in two different matrix: group1 (n=30) and group2 (n=60). So for each transcript, I have around 90 different value.
data format for Group 1 but is the same for Group 2
GENE Group1 Group1 Group1 Group1
ENSG00000183154.1 0.925443037 3.0369279927 2.7557872516 2.2384197806
ENSG00000227210.1 2.0079999555 3.2941987268 1.7373249132 1.8805827534
ENSG00000240591.1 1.833712588 2.8881138203 2.8422437594 1.654280957
ENSG00000279342.1 2.1612707148 2.3112540357 3.4992176678 2.3862284068
ENSG00000248383.4 2.2085886874 2.7214426016 0 0
ENSG00000253837.1 1.384282608 3.4071437949 1.7373249132 1.3734792256
ENSG00000107165.11 1.6305563 1.2443063796 2.3422637913 3.8709988618
So for each gene I want to know if there are differences between groups and also if there are differences in the expression average. I'm planing to do a chisq using R, but not sure which could be the best method, any suggestion?
Thanks!
Do you have access to the raw read counts? If so, please read https://f1000research.com/articles/4-1070/v2
Yes, this is exactly what I did, in more detail, the matrix is derived from varianceStabilizingTransformation, but I can't find a method to get the differences among each transcript per group and neither if there are differences for the average expression. I have the PCA, the heatmap... but I need those values also, any idea?
Okay I'm slightly confused, it sounds like you want to do a differential expression analysis (an example of which you can find in the workflow I linked). Is that correct?
Yes it is! But I couldn't find a way to get the differential expression for each gene (" to say if ENSG00000183154.1 is more expressed in group1 and group2). So is there a function to do that?
If you look at the "Building the results table" section of that workflow you'll see that the "results" function will give you a table of genes which are either up- or down- regulated in either group.
So the function
res <- results(dds)
that report a pvalue for each gene is giving me that information? or should I use instead sum(res$pvalue < 0.05, na.rm=TRUE) for statically diferences?sum(res$pvalue < 0.05, na.rm=TRUE)
will give you the number of DEGs.results(dds)
will give you the list.You also might find useful to: DESeq2 proper design setting
And how can I export the transcripts ID that are deferentially expressed in both groups? is the output a data frame?
I don't remember what class the output is, but typing
class(results(dds))
will tell you so. Anyway, you just need towrite.table(results(dds), file="yourfilenamechoice.table")
to export them into a file.many thanks for the information!