Extracting group-specific combinations from distance matrix
1
0
Entering edit mode
3.8 years ago

Hello,

I have a bray-curtis distance matrix from phyloseq in R. My dataset consists of 4 groups and 2 repeated measurements (1, 2) from one subject. Now I would like to extract every distance value belonging to a specific group and group-comparison, i. e.:

           A           B             C             D
  all   A all      B all        C all         D all

  1    A 1          B 1         C 1           D 1

  2     A 2         B  2         C 2          D 2

Do you have an idea how I could do that? Subsequently I want to build boxplots for each group with ggplot2.

R • 2.6k views
ADD COMMENT
0
Entering edit mode

ADD REPLY
1
Entering edit mode
3.8 years ago

Hi,

Assuming that you've done something like:

# import packages and data
library("phyloseq")
data("GlobalPatterns")

# calculate beta-diversity - distance - Bray-Curtis
dist_bray <- distance(GlobalPatterns, method = "bray")
dist_bray_mtx <- as.matrix(dist_bray) # convert 'dist' to 'matrix'

# retrieve the beta-diversity value between samples: "CL3" and "CC1"
dist_bray_mtx["CL3", "CC1"] # 0.5988805

You can convert the dist object into a matrix with as.matrix. If you already have a matrix object (see which object do you've by doing class(object_name)) you can query/index it as you usually do with a matrix object. For instance assuming that you're interested in obtaining the beta-diversity distance/value between samples "CL3" and "CC1" (from the example given above), you would do: dist_bray_mtx["CL3", "CC1"]. This would give you ~0.60.

Regarding the boxplot, you can check the following Biostars post that mentions how to do it directly from a phyloseq object: How to generate Beta diversity boxplot from phyloseq object?.

I hope this helps,

António

ADD COMMENT
0
Entering edit mode

Hi António,

thanks a lot for your help, I was already following some of your discussions here closely.

Your function is running fine and gives the within beta-diversities of groups. What I am looking for is also between beta-diversity of the groups.

The goal is to create a figure like fig. 2B in this paper. I would like to have a boxplot from every possible combination of groups to get an overview of my data. Actually, I have the following setup:

  • control (i. e. unaffacted body site)
  • disease (can be subset into 2 stages) - stage A - stage B

Every subject has 2 sampled disease sites as well as 2 unaffected control samples = 4 samples/subject. What I would like to build is a boxplot from every distance measurement for every within group (like you already did) and every between group for all subjects as well as only inter-subject and only intra subject. Here is a table for better ease of viewing. I know it is a lot of boxplots, but I guess it will give a good first overview for following analysis.

In the phyloseq sample_data, I have 1 grouping variable which assigns unaffected/disease, 1 goruping variable assigning unaffected/stage A/stage B and of course 1 grouping variable assigning the subject ID. Can I somehow retrieve the corresponding values for each correspondance?

ADD REPLY
0
Entering edit mode

Hi,

To obtain within and between beta-diversity boxplots you might want to try this script: https://rdrr.io/github/jeffkimbrel/jakR/src/R/plotDistances.R.

If the variable that you want to compare in the boxplot is in your sample_data() provided to phyloseq in principle you just need to provide the name of this to this function and it should work.

I hope this help,

António

ADD REPLY
0
Entering edit mode

Hi,

I do not have the one variable holding all possible combinations in sample_data() since some of the samples are in the groups multiple times. E. g. the disease overall and sub sampled to stage A and stage B. Or all diseased samples of all subjects combined or only the intra diseased samples (only the disease samples from one subject).

I guess I can just extract them from the distance matrix I obtained and I could do this by hand, but that would be very biased in the case I miss a sample. Also I would need to repeat this when generating more data.

With the provided online function I was able to build boxplots that show within and between beta-diversity of groups compared to one group. I would like to have it all in one plot like here with all the combinations in the table.

ADD REPLY
0
Entering edit mode

I see!

Well the plot that you want requires a bit of customization. It is not difficult to do but it requires a bit of experience to make from the data that you've.

Basically you need to convert the distance matrix to a data frame, tidy the data frame, then add the metadata to this and only then you can plot it with ggplot2.

You can try to make it yourself. If I've time I'll try to implement it in the other script (without promises).

ADD REPLY
0
Entering edit mode

Thanks a lot for your kind help!! I will also try my best to clean/arrange the data.frame.

I guess tidy data.frames are done with tidyr from tidyverse?

ADD REPLY

Login before adding your answer.

Traffic: 2027 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6