Question

Difference in means between sexes over time

0

Entering edit mode

4.4 years ago

selplat21 ▴ 20

Hello,

I have males and females across time for various phenotypes. I first began by binning my data in 20 year increments.

Data$cuts <- cut(Data$year, breaks = c(seq(min(Data$year), max(Data$year), 20), max(Data$year)), labels = FALSE)

This now produces a cut or bin with a value from 1-8 for every individual in my dataset.

I then am trying to produce an output with the difference in mean between males and females in a trait for each bin of time.

for (i in 1:8) {
  difmean <- c()
  Mcuts <- DataM[ which(DataM$cuts=='i'),]
  Fcuts <- DataF[ which(DataF$cuts=='i'),]
  Mmean <- mean(Mcuts$trait, na.rm = TRUE)
  Fmean <- mean(Fcuts$trait, na.rm = TRUE)
  difmean <- c(Mmean-Fmean)
  print (difmean)
}

I get an output of the following:

[1] NaN [1] NaN [1] NaN [1] NaN [1] NaN [1] NaN [1] NaN [1] NaN

Any help would be greatly appreciated!

R • 1.2k views

ADD COMMENT • link 4.4 years ago by selplat21 ▴ 20

1

Entering edit mode

Got it, you use 'i' instead of i in DataM$cuts=='i' and it's never the string 'i'

ADD REPLY • link 4.4 years ago by Asaf 10k

0

Entering edit mode

Thank you!! It is working now, much appreciated.

ADD REPLY • link 4.4 years ago by selplat21 ▴ 20

0

Entering edit mode

Is there a way to assess significance of a linear model with binned data? I pasted some code below that generates the regression line, but I don't get p-values from the summary. Maybe I need to bootstrap and just look at confidence intervals?

ADD REPLY • link 4.4 years ago by selplat21 ▴ 20

1

Entering edit mode

I think you should start a new thread for that question

ADD REPLY • link 4.4 years ago by Asaf 10k

0

Entering edit mode

Do Data and DataM and DataF have the same number of rows? Is trait a column in DataM and DataF?

ADD REPLY • link 4.4 years ago by Asaf 10k

0

Entering edit mode

DataM and DataF have a different numbers of rows, but the same columns. $trait is a column in both datasets.

DataM and DataF were generated like so:

DataM <- Data[which(Data$sex=="M"),]
DataF <- Data[which(Data$sex=="F"),]

ADD REPLY • link updated 4.4 years ago by Ram 44k • written 4.4 years ago by selplat21 ▴ 20

0

Entering edit mode

Side note: Why use which() when just specifying DataM<-Data[Data$sex=="M",] would work just fine?

ADD REPLY • link 4.4 years ago by Ram 44k

0

Entering edit mode

You're right, it was just how I left it during processing.

ADD REPLY • link 4.4 years ago by selplat21 ▴ 20

score 0 · Answer 1 · 2020-06-25

Update,

I was able to loop through and provide a mean difference, sample size for each sex, and total sample size.

Data$cuts <- cut(Data$year, breaks = c(seq(min(Data$year), max(Data$year), 20), max(Data$year)), labels = FALSE)

DataM <- Data[Data$sex=="M",]
DataF <- Data[Data$sex=="F",]

mean.df <- as.data.frame(c())

for (i in 2:8) {
  Mcuts <- DataM[which(DataM$cuts==i),]
  Fcuts <- DataF[which(DataF$cuts==i),]
  Mmean <- mean(Mcuts$trait, na.rm = TRUE)
  Fmean <- mean(Fcuts$trait, na.rm = TRUE)
  mean.df[i, "bin"] <- paste(i)
  mean.df[i, "mean_dif"] <- paste(Mmean-Fmean)
  mean.df[i, "ss_f"] <- paste(length(Mcuts$cuts))
  mean.df[i, "ss_m"] <- paste(length(Fcuts$cuts))
  mean.df[i, "ss_t"] <- paste(sum(length(Fcuts$cuts),length(Mcuts$cuts)))
  }

lm1 <- lm(mean_dif ~ bin, data=mean.df)
plot(mean.df$bin, mean.df$mean_dif)
abline(lm1)
summary(lm1)

Unfortunately, because this is binned data, the lm() command is unable to produce p-values. Is there a way to assess significance of the above trendline with binned data and account for the different sample sizes of bins?