The first approach that comes to mind is to try with R:
plot(cmdscale(dist(iris[,1:4])), col=iris[,5])
BUT: I think you might have too many rows to have a readily applicable MDS problem for scatter-plotting. No single organism has 125k genes, so you are possibly using a concatenation of gene expression measurements from different organisms. You should definitely consider that your problem is ill-posed, whether you could try to reduce the number of rows by filtering genes or annotation of known orthologs.
In any case you should consider the use of smoothScatter for the resulting data, because even for only 30k points, using plot just gives a black blob.
Edit: As you are using edgeR, you can use the function plotMDS() as described in the manual: http://www.bioconductor.org/packages/release/bioc/vignettes/edgeR/inst/doc/edgeRUsersGuide.pdf pp. 63(data preparation: calcNormFactors)-65
This produces a MDS plot of the colums instead of rows, similar to:
plot(cmdscale(dist(t(iris[,1:4]))), type = "n")
text(cmdscale(dist(t(iris[,1:4]))), labels=colnames(iris)[1:4])
I've carried out DE with EdgeR but all I want to do is express a simple graph of how different the samples are,regardless of D.E.G's... I'm novice hence I've just followed the Trinity/ RSEM/EdgeR pipeline. Regarding samples,there are no replicates and I have used the 'no replicates' option available in EdgeR.
In that case just reload the code you used to run edgeR. Find whatever variable you called the output of
calcNormFactors
with and useplotMDS()
.e.g.
Still you need to explain why you have 125,000 rows.
Could they be transcripts, not genes. Isn't 125K transcripts a concievable number given a large genome and de novo assembly.
Please see below, thanks.
Thanks for the advice although I'm still trying to understand. I utilised Trinity to perform a de novo transcriptome assemblage, read counts were provided by realigning the reads back to the transcriptome using RSEM, output from RSEM was then put in a matrix,normalised and fed into EdgeR . The methodology is published by Haas et al.
Basically my data is over 3 time points, day x,y,z. There are 3 conditions, so in essence 9 samples in total. EdgeR does Genes but also transcripts... as I am looking at functional annotation downstream I am using transcript expression levels rather than the genes because all the genes end with the identify of comp2132435_seq0 etc... So for this gene there are possibly 7 isoforms(transcripts). Only the transcripts can be identified as the genes have no actual sequence attached to them when I extract them from the .fasta file assembled by trinity. Does this make sense to you guys?
Please use the comment function, do not make a new answer.
Identifying DE features: No biological replicates
http://trinityrnaseq.sourceforge.net/analysis/diff_expression_analysis.html
This is the pipeline I have used... Can you please advise me now what file to use if I want to creat MA/MDSplots, do you guys have any good references where I can learn how to use the MDSplot function within Rstudio. Thanks.
Please look at the links in both answers, edgeR manual contains clear instructions for how to do this.