Question

Is chi-square test suitable for my categorical data?

0

Entering edit mode

21 months ago

Apex92 ▴ 320

Dear all,

I have a two-column data frame (head is attached below) and both of them contain strings. The first column contains the location information of genes that are either "5p" or "3p". And the second column has information if "Frameshift" or "No Frameshift" happens at that location.

head(df)

Location       type
3p         No FrameShift
5p            FrameShift
3p         No FrameShift
5p         No FrameShift
3p         No FrameShift
3p            FrameShift

In total, I have 80 rows and I created a barplot where the x-axis is based on "Frameshift" or "No Frameshift" and the height of bars shows the frequency of 5p and 3p in each category of the x-axis. I want to perform a statistical test to see if the frequency difference between 5p and 3p in the "Frameshift" category or the "No Frameshift" is significant.

I came up with the chi-square as below, do you think it is suitable for my data:

library(dplyr)
library(stats)

#converting to table
contingency_table <- table(df$Location, df$type)

#chi-square test for each comparison (between 5p freq and 3p freq in Frameshift AND between 5p freq and 3p freq in No Frameshift)
chi2_result_1 <- chisq.test(contingency_table[1:2, 1])
chi2_result_2 <- chisq.test(contingency_table[1:2, 2])

#p-values
p_value_comparison_1 <- chi2_result_1$p.value
p_value_comparison_2 <- chi2_result_2$p.value

Thank you in Advance.

statistics r • 672 views

ADD COMMENT • link 21 months ago by Apex92 ▴ 320

0

Entering edit mode

I do not see any problem in your comparisons. Chi-Square goodness-of-fit test can be used even for a single variable comparing observed and expected frequencies across multiple categorical variables.

ADD REPLY • link 21 months ago by bk11 ★ 3.1k

0

Entering edit mode

Thank you for your comment. My contingency table looks as this:

       FrameShift  No FrameShift
  3p         34            18
  5p         16            12

And I aimed to calculate the p-value for the difference between the 5p and 3p categories in the Frameshift section (meaning 34 vs 16) and the No Frameshift section (meaning 18 vs 12). Based on your comment I think my approach is correct, right?

ADD REPLY • link 21 months ago by Apex92 ▴ 320