merge samples in phyloseq returns NAs
0
0
Entering edit mode
4.4 years ago
annaA ▴ 10

Hello I am working in a dataset which contains several categorical variables in the meta data (sample_data). i want to merge some replicates and maybe in the next steps I will need to merge other samples to. To do so I am using the merge_samples function as follows f_data: original phyloseq object "sample" : sample ID where the replicates have the same ID

test <- merge_samples(x=f_data, group="sample")

the meta data before look like this :

Sample Data:        [237 samples by 15 sample variables]:
     group sample   ring_no         species    sex rearing_nest family_membership  sampling index_1 index_2
G1      G1  FD001  White247 Bengalese_Finch   Male         VS13            Father Fostering    S504    N708
G10    G10  FD026  Grey1304     Zebra_Finch   Male         KE05            Father Fostering    S507    N712
G100  G100  FD250  Grey1302     Zebra_Finch   Male         KE03            Father    Day_10    S503    N706
G101  G101  FD252  Grey1302     Zebra_Finch   Male         KE03            Father   Day_100    S504    N712
G102  G102  FD256  Grey1322     Zebra_Finch Female         KE03            Mother    Day_35    S507    N710
G103  G103  FD270 Silver179 Bengalese_Finch Female         VS13            Mother    Day_10    S508    N705

and after the merging like this :

Sample Data:        [234 samples by 16 sample variables]:
      group sample ring_no species sex rearing_nest family_membership sampling index_1 index_2 breading_no
FD001    NA     NA      NA      NA  NA           NA                NA       NA      NA      NA         1.0
FD002    NA     NA      NA      NA  NA           NA                NA       NA      NA      NA         1.0
FD003    NA     NA      NA      NA  NA           NA                NA       NA      NA      NA         1.0
FD004    NA     NA      NA      NA  NA           NA                NA       NA      NA      NA         1.0
FD005    NA     NA      NA      NA  NA           NA                NA       NA      NA      NA         1.0
FD006    NA     NA      NA      NA  NA           NA                NA       NA      NA      NA         1.0
FD007    NA     NA      NA      NA  NA           NA                NA       NA      NA      NA         1.0
FD008    NA     NA      NA      NA  NA           NA                NA       NA      NA      NA         1.0
FD009    NA     NA      NA      NA  NA           NA                NA       NA      NA      NA         1.0
FD026    NA     NA      NA      NA  NA           NA                NA       NA      NA      NA         1.0
FD027    NA     NA      NA      NA  NA           NA                NA       NA      NA      NA         1.0
FD028    NA     NA      NA      NA  NA           NA                NA       NA      NA      NA         1.0
FD029    NA     NA      NA      NA  NA           NA                NA       NA      NA      NA         1.0

I will appreciate any help in how to solve this problem. A

R phyloseq biom • 4.4k views
ADD COMMENT
1
Entering edit mode

Hi,

Sorry, can you edit your post? I can't read the table because it is not formatted properly.

Just to confirm, in your metadata file provided to phyloseq, do you have the column name sample?

Can you do (below) and post here the result?

str(sample_data(f_data))

This may be related with the fact that sample variable in your sample_data(f_data) is not a factorial variable. If that is the case you can provide a factorial variable to the group argument in the function merge_samples() by doing:

fct_group_var <- factor( unlist(sample_data(f_data)[,"sample"]) )
test <- merge_samples(x=f_data, group=fct_group_var)

This should work. Let me know if it worked.

António

ADD REPLY
0
Entering edit mode

Hey thanks for your reply .

So the output of the str(sample_data(f_data)) is the following

'data.frame':   237 obs. of  15 variables:
Formal class 'sample_data' [package "phyloseq"] with 4 slots
  ..@ .Data    :List of 15
  .. ..$ : chr  "G1" "G10" "G100" "G101" ...
  .. ..$ : chr  "FD001" "FD026" "FD250" "FD252" ...
  .. ..$ : chr  "White247" "Grey1304" "Grey1302" "Grey1302" ...
  .. ..$ : chr  "Bengalese_Finch" "Zebra_Finch" "Zebra_Finch" "Zebra_Finch" ...
  .. ..$ : chr  "Male" "Male" "Male" "Male" ...
  .. ..$ : chr  "VS13" "KE05" "KE03" "KE03" ...
  .. ..$ : chr  "Father" "Father" "Father" "Father" ...
  .. ..$ : chr  "Fostering" "Fostering" "Day_10" "Day_100" ...
  .. ..$ : chr  "S504" "S507" "S503" "S504" ...
  .. ..$ : chr  "N708" "N712" "N706" "N712" ...
  .. ..$ : int  1 1 1 1 1 1 1 1 1 2 ...
  .. ..$ : int  592 799 799 799 613 613 613 613 613 799 ...
  .. ..$ : int  6 4 3 3 3 6 4 11 11 14 ...
  .. ..$ : int  2 1 1 1 1 2 1 2 2 1 ...
  .. ..$ : logi  NA NA NA NA NA NA ...
  ..@ names    : chr  "group" "sample" "ring_no" "species" ...
  ..@ row.names: chr  "G1" "G10" "G100" "G101" ...
  ..@ .S3Class : chr "data.frame"

I run the code you suggested but still I have this warning messange and the NAs in the metadata

ADD REPLY
0
Entering edit mode

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized.

ADD REPLY
0
Entering edit mode

Sure ! next time I'll do it right

ADD REPLY
0
Entering edit mode

So, from what I see in the metadata table before merging your phyloseq object, the variable sample only holds distinct sample names.

What do you want to merge? I believe that is different replicates, right? If so, you need to have one factor variable in your meta data file that has the same name for different replicates that belong to the same sample. It is not clear for me that you have such variables from your example. Just to be sure.

Sample   Group

ctrl_1  control
ctrl_2  control
ctrl_3  control
test_1  test
test_2  test
test_3  test

In this case, you have a phyloseq object by Sample and you want to merge by Group, i.e., merge sample replicates per control versus test conditions. Are you sure that you have this info accordingly in your meta data.

If you have, I cannot understand the error.

António

ADD REPLY
0
Entering edit mode

In my data there are 3 samples with the same sample name( i.e "sample") FDR0033 lets say.So I thought if I merge them by "sample" it will work.But from what you are saying this is not correct. So I need to add a new variable "x" in which I will give the same name to these 3 samples only(?) and merge the object by "x" ?? I am right? sorry if this is really stupid is the first time I am dealing with this kind of analysis A.

ADD REPLY
0
Entering edit mode

So, in the case it seems that you're doing everything right.

I just tested with the GlobalPatterns data that comes with phyloseq and it just works fine.

data(GlobalPatterns)

## Print sample data:
sample_data(GlobalPatterns)

Sample Data:        [26 samples by 7 sample variables]:
     X.SampleID  Primer Final_Barcode Barcode_truncated_plus_T Barcode_full_length         SampleType                                  Description
CL3             CL3 ILBC_01        AACGCA                   TGCGTT         CTAGCGTGCGT               Soil     Calhoun South Carolina Pine soil, pH 4.9
CC1             CC1 ILBC_02        AACTCG                   CGAGTT         CATCGACGAGT               Soil     Cedar Creek Minnesota, grassland, pH 6.1
SV1             SV1 ILBC_03        AACTGT                   ACAGTT         GTACGCACAGT               Soil   Sevilleta new Mexico, desert scrub, pH 8.3
M31Fcsw     M31Fcsw ILBC_04        AAGAGA                   TCTCTT         TCGACATCTCT              Feces      M3, Day 1, fecal swab, whole body study
M11Fcsw     M11Fcsw ILBC_05        AAGCTG                   CAGCTT         CGACTGCAGCT              Feces     M1, Day 1, fecal swab, whole body study 
M31Plmr     M31Plmr ILBC_07        AATCGT                   ACGATT         CGAGTCACGAT               Skin      M3, Day 1, right palm, whole body study
M11Plmr     M11Plmr ILBC_08        ACACAC                   GTGTGT         GCCATAGTGTG               Skin     M1, Day 1, right palm, whole body study 
F21Plmr     F21Plmr ILBC_09        ACACAT                   ATGTGT         GTAGACATGTG               Skin    F1, Day 1,  right palm, whole body study 

a <- merge_samples(GlobalPatterns, group = factor(as.character(unlist(sample_data(GlobalPatterns)[,"SampleType"]))))

sample_data(a)

Sample Data:        [9 samples by 7 sample variables]:
               X.SampleID Primer Final_Barcode Barcode_truncated_plus_T Barcode_full_length
Feces                    19.0   13.5          13.5                16.500000           13.750000
Freshwater               15.0   11.5          11.5                12.000000            4.500000
Freshwater (creek)        2.0   14.0          14.0                13.000000            6.666667
Mock                      7.0   25.0          25.0                12.333333           16.000000
Ocean                    18.0   17.0          17.0                13.666667           17.000000
Sediment (estuary)       23.0   20.0          20.0                15.000000           14.666667
Skin                     12.0    7.0           7.0                 9.666667           14.666667
Soil                     10.0    2.0           2.0                13.333333           11.333333
Tongue                   14.5    9.5           9.5                15.000000           23.000000

As you see it works just fine. Which phyloseq version are you using?

António

ADD REPLY
0
Entering edit mode

yeah but do you see that some of the variables are changed for example the variable "primer" in the new object is translated to numeric values?

I am using version 1.32.0

A

ADD REPLY
0
Entering edit mode

Yes, I understand what you're saying. But in this case make sense. I mean if you have different primers per different samples, and you merge the samples the information in primers are useless, you cannot merge them.

My version is 1.30.0. Can you try to downgrade?

António

ADD REPLY
0
Entering edit mode

Dear annaA, did you solve this problem? I have the exact same issue Im trying to fix now, which is that NAs appear in sample variables after merge_samples. Thanks a lot for any kind of help in this regard.

ADD REPLY
0
Entering edit mode

I noticed this started happening after updating to R to version 4.0 or higher. I believe it has something to do with the base R change of no longer automatically importing strings as factors.

Converting the columns to factor variables gets me about halfway there - they are no longer NAs, but they remain encoded as integers. The old workaround to then reassign the factor labels is no longer working for reasons I can't quite figure out.

ADD REPLY

Login before adding your answer.

Traffic: 2028 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6