Difference in means between sexes over time
1
0
Entering edit mode
4.4 years ago
selplat21 ▴ 20

Hello,

I have males and females across time for various phenotypes. I first began by binning my data in 20 year increments.

Data$cuts <- cut(Data$year, breaks = c(seq(min(Data$year), max(Data$year), 20), max(Data$year)), labels = FALSE)

This now produces a cut or bin with a value from 1-8 for every individual in my dataset.

I then am trying to produce an output with the difference in mean between males and females in a trait for each bin of time.

for (i in 1:8) {
  difmean <- c()
  Mcuts <- DataM[ which(DataM$cuts=='i'),]
  Fcuts <- DataF[ which(DataF$cuts=='i'),]
  Mmean <- mean(Mcuts$trait, na.rm = TRUE)
  Fmean <- mean(Fcuts$trait, na.rm = TRUE)
  difmean <- c(Mmean-Fmean)
  print (difmean)
}

I get an output of the following:

[1] NaN [1] NaN [1] NaN [1] NaN [1] NaN [1] NaN [1] NaN [1] NaN

Any help would be greatly appreciated!

R • 1.2k views
ADD COMMENT
1
Entering edit mode

Got it, you use 'i' instead of i in DataM$cuts=='i' and it's never the string 'i'

ADD REPLY
0
Entering edit mode

Thank you!! It is working now, much appreciated.

ADD REPLY
0
Entering edit mode

Is there a way to assess significance of a linear model with binned data? I pasted some code below that generates the regression line, but I don't get p-values from the summary. Maybe I need to bootstrap and just look at confidence intervals?

ADD REPLY
1
Entering edit mode

I think you should start a new thread for that question

ADD REPLY
0
Entering edit mode

Do Data and DataM and DataF have the same number of rows? Is trait a column in DataM and DataF?

ADD REPLY
0
Entering edit mode

DataM and DataF have a different numbers of rows, but the same columns. $trait is a column in both datasets.

DataM and DataF were generated like so:

DataM <- Data[which(Data$sex=="M"),]
DataF <- Data[which(Data$sex=="F"),]
ADD REPLY
0
Entering edit mode

Side note: Why use which() when just specifying DataM<-Data[Data$sex=="M",] would work just fine?

ADD REPLY
0
Entering edit mode

You're right, it was just how I left it during processing.

ADD REPLY
0
Entering edit mode
4.4 years ago
selplat21 ▴ 20

Update,

I was able to loop through and provide a mean difference, sample size for each sex, and total sample size.

Data$cuts <- cut(Data$year, breaks = c(seq(min(Data$year), max(Data$year), 20), max(Data$year)), labels = FALSE)

DataM <- Data[Data$sex=="M",]
DataF <- Data[Data$sex=="F",]

mean.df <- as.data.frame(c())

for (i in 2:8) {
  Mcuts <- DataM[which(DataM$cuts==i),]
  Fcuts <- DataF[which(DataF$cuts==i),]
  Mmean <- mean(Mcuts$trait, na.rm = TRUE)
  Fmean <- mean(Fcuts$trait, na.rm = TRUE)
  mean.df[i, "bin"] <- paste(i)
  mean.df[i, "mean_dif"] <- paste(Mmean-Fmean)
  mean.df[i, "ss_f"] <- paste(length(Mcuts$cuts))
  mean.df[i, "ss_m"] <- paste(length(Fcuts$cuts))
  mean.df[i, "ss_t"] <- paste(sum(length(Fcuts$cuts),length(Mcuts$cuts)))
  }

lm1 <- lm(mean_dif ~ bin, data=mean.df)
plot(mean.df$bin, mean.df$mean_dif)
abline(lm1)
summary(lm1)

Unfortunately, because this is binned data, the lm() command is unable to produce p-values. Is there a way to assess significance of the above trendline with binned data and account for the different sample sizes of bins?

ADD COMMENT

Login before adding your answer.

Traffic: 1509 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6