Question

MDS plots with R

1

Entering edit mode

10.5 years ago

Biogeek ▴ 470

Can someone give me a simple code so that I can plot an MDS plot for the example data below, I want to compare 9 conditions with a total of 125,000 genes (rows) and expression values. I am new to R and have as much clue as a headless chicken. I am however learning at the moment. Thanks.

X                      LOW1      LOW2       LOW3 -------> 9 conditions total
gene1               0.432       0.231          0.3211
gene2
gene3
gene4
gene5
gene6
gene7
gene8

RNA-Seq MDS-plots Statistics • 21k views

ADD COMMENT • link updated 3.2 years ago by Ram 44k • written 10.5 years ago by Biogeek ▴ 470

0

Entering edit mode

I've carried out DE with EdgeR but all I want to do is express a simple graph of how different the samples are,regardless of D.E.G's... I'm novice hence I've just followed the Trinity/ RSEM/EdgeR pipeline. Regarding samples,there are no replicates and I have used the 'no replicates' option available in EdgeR.

ADD REPLY • link 10.5 years ago by Biogeek ▴ 470

0

Entering edit mode

In that case just reload the code you used to run edgeR. Find whatever variable you called the output of calcNormFactors with and use plotMDS().

e.g.

y <- calcNormFactors(y)
plotMDS(y)

ADD REPLY • link updated 4.9 years ago by Ram 44k • written 10.5 years ago by gtho123 ▴ 260

0

Entering edit mode

Still you need to explain why you have 125,000 rows.

ADD REPLY • link 10.5 years ago by Michael 55k

0

Entering edit mode

Could they be transcripts, not genes. Isn't 125K transcripts a concievable number given a large genome and de novo assembly.

ADD REPLY • link 10.5 years ago by gtho123 ▴ 260

0

Entering edit mode

Please see below, thanks.

ADD REPLY • link updated 4.9 years ago by Ram 44k • written 10.5 years ago by Biogeek ▴ 470

0

Entering edit mode

Thanks for the advice although I'm still trying to understand. I utilised Trinity to perform a de novo transcriptome assemblage, read counts were provided by realigning the reads back to the transcriptome using RSEM, output from RSEM was then put in a matrix,normalised and fed into EdgeR . The methodology is published by Haas et al.

Basically my data is over 3 time points, day x,y,z. There are 3 conditions, so in essence 9 samples in total. EdgeR does Genes but also transcripts... as I am looking at functional annotation downstream I am using transcript expression levels rather than the genes because all the genes end with the identify of comp2132435_seq0 etc... So for this gene there are possibly 7 isoforms(transcripts). Only the transcripts can be identified as the genes have no actual sequence attached to them when I extract them from the .fasta file assembled by trinity. Does this make sense to you guys?

ADD REPLY • link 10.5 years ago by Biogeek ▴ 470

0

Entering edit mode

Please use the comment function, do not make a new answer.

ADD REPLY • link 10.5 years ago by Michael 55k

0

Entering edit mode

Identifying DE features: No biological replicates

http://trinityrnaseq.sourceforge.net/analysis/diff_expression_analysis.html

This is the pipeline I have used... Can you please advise me now what file to use if I want to creat MA/MDSplots, do you guys have any good references where I can learn how to use the MDSplot function within Rstudio. Thanks.

ADD REPLY • link updated 5.2 years ago by Ram 44k • written 10.5 years ago by Biogeek ▴ 470

0

Entering edit mode

Please look at the links in both answers, edgeR manual contains clear instructions for how to do this.

ADD REPLY • link 10.5 years ago by Michael 55k

Ram · Answer 1 · 2014-06-15

Since this is an RNA-Seq experiment you are probably going to want to do differential expression (DE). When you perform DE with a package like edgeR you can easily draw a MDS plot using the plotMDS() function (which is actually from the limma package).

To do this you need to use tables of counts, not expression values. This could be achieved using samtools.

Give this a read and it should help you out: http://www.bioconductor.org/packages/release/bioc/vignettes/edgeR/inst/doc/edgeRUsersGuide.pdf

I don't really want to write the code for you as this is actually really achievable and rewarding if you follow along with the tutorial/vinegette, especially if you are new. Granted if you really have nine different conditions (i.e. none are replicates) it could get complicated but given that the first three are called LOW* this does not appear to be the case.

Other DE packages you could look at include DESeq and limma using voom

EDIT:

I notice from your other questions that you have been using edgeR. This is a good example of "always check the manual" (DO THIS!). However you seem confused about what to put into it so here is a quick run down about making count tables (in the terminal not R).

Map your reads to your reference (i.e. transcriptome, genome ...) using something like BWA MEM
Convert the SAM files to BAM files using samtools view -bS *.sam > *.bam
Sort your BAM files using samtools sort *.bam *_sort
Index your sorted BAM file using samtools index *_sort.bam
Get your count tables using samtools idxstats *_sort.bam > *.txt (Output is gene name - gene length - mapped reads - unmapped reads)
Go into R and combine your mapped reads into one count table
Run edgeR and make your MDS plot

Hope this helps.

Ram · Answer 2 · 2014-06-15

The first approach that comes to mind is to try with R:

plot(cmdscale(dist(iris[,1:4])), col=iris[,5])

BUT: I think you might have too many rows to have a readily applicable MDS problem for scatter-plotting. No single organism has 125k genes, so you are possibly using a concatenation of gene expression measurements from different organisms. You should definitely consider that your problem is ill-posed, whether you could try to reduce the number of rows by filtering genes or annotation of known orthologs.

In any case you should consider the use of smoothScatter for the resulting data, because even for only 30k points, using plot just gives a black blob.

Edit: As you are using edgeR, you can use the function plotMDS() as described in the manual: http://www.bioconductor.org/packages/release/bioc/vignettes/edgeR/inst/doc/edgeRUsersGuide.pdf pp. 63(data preparation: calcNormFactors)-65

This produces a MDS plot of the colums instead of rows, similar to:

plot(cmdscale(dist(t(iris[,1:4]))), type = "n")
text(cmdscale(dist(t(iris[,1:4]))), labels=colnames(iris)[1:4])