Question

significant values after a fisher test

1

Entering edit mode

9.6 years ago

yasjas ▴ 70

Hello everyone,

I am a bit stuck doing some data analysis on a Hepatocytes (healthy) and Huh7 (cancer) tissues and find the significant probes in both tissues

I did a fisher test on the data frame composed of healthy probe,cancer probe and repeatmasker probe, came up with a list of significant p values and now I would like to filter the one who are significant in Huh7 and Hepatocytes separately, fisher test gives the whole significant one without specifying which one belongs to which?

Any ideas on how I can get this information?

Thanks in advance

R • 2.7k views

ADD COMMENT • link updated 2.4 years ago by Ram 44k • written 9.6 years ago by yasjas ▴ 70

1

Entering edit mode

The p-value is of cancer vs. healthy, assuming the total probes are represented by repeatmasker. There's no way to get a p-value for "just Huh7", because that's not a coherent concept.

ADD REPLY • link updated 2.4 years ago by Ram 44k • written 9.6 years ago by Devon Ryan 104k

0

Entering edit mode

Do you think that I can get the names which are significant in cancer?

ADD REPLY • link updated 2.4 years ago by Ram 44k • written 9.6 years ago by yasjas ▴ 70

0

Entering edit mode

They're the row names.

ADD REPLY • link 9.6 years ago by Devon Ryan 104k

0

Entering edit mode

It'd be helpful if you gave an example of exactly what you're doing, including the code. A line or two of the data would help too.

ADD REPLY • link 9.6 years ago by Devon Ryan 104k

0

Entering edit mode

#create data frame for all the count files

df1 <- data.frame(healthy)
df2 <- data.frame(cancer)
df3 <- data.frame(repeatmasker_count)

                                     healthy          cancer      x
(A)n                                    92                  76  13606
(AAATG)n                           0                   0    372
(AACTG)n                           0                   0     11
(AAGTG)n                           0                   0     21
(AATAG)n                           1                   0    159
(AATTG)n                           1                   0     96
(ACATG)n                           0                   0     23
(ACCG)n                            0                   0      2
(ACCTG)n                           0                   0      9
(ACGTG)n                           0                   0      8
(ACTAG)n                           0                   0     12
(ACTG)n                            0                   0     51
(ACTTG)n                           0                   0      7
(AGATG)n                           0                   0     93
(AGCTG)n                           0                   1     25
(AGGGGG)n                          5                  11    179
(AGGTG)n                           4                   6     63
(AGTAG)n                           1                   0     15
(AGTTG)n                           0                   0     15
(ATAAG)n                           0                   0     34
(ATAGG)n                           0                   0     31
(ATATG)n                           2                   0    130
(ATCG)n                            0                   0      1
(ATCTG)n                           0                   0      6
(ATG)n                             8                   3   2050
(ATGGTG)n                          6                   6    513

#merge all the data frame in one df
df <- merge(df1,df2,by="row.names",all.x=T,all.y=T)
rownames(df) <- df$Row.names
df$Row.names <-NULL
df <- merge(df,df3,by="row.names",all.x=T,all.y=T)
rownames(df) <- df$Row.names
df$Row.names <-NULL
df[["cancer"]][is.na(df[["cancer"]])] <- 0
df[["healthy"]][is.na(df[["healthy"]])] <- 0

#calculate pvalue with fisher test
pvalues <- apply(df,1,function(x) fisher.test(rbind(x[1:2],x[3]-x[1:2]))$p.value)
df_pvalues <- data.frame(pvalues)

#calculate adjusted pvalue
adjpvalue <- p.adjust(pvalues,method="bonferroni")
df_pvalue_adjpvalue<- data.frame(df_pvalues,adjpvalue)

#calculate FDR
fdr <- p.adjust(pvalues,method="fdr")
data_final <- data.frame(df_pvalue_adjpvalue,fdr)

#visualise significance
sig.bon = adjpvalue<0.05
significantfisher <- subset(sig.bon,sig.bon ==TRUE)

"(CGGGG)n"  "(GAATG)n"  "(TA)n"     "(TAGA)n"   "(TGGA)n"  
 "AluJo"     "AluSq2"    "AluSx"     "AluSx1"    "AluSz"    
"AT_rich"   "C-rich"    "Charlie4z" "G-rich"    "GC_rich"  "LTR22C"    "LTR40a"    "LTR41"     "LTR41B"    "LTR50"    
 "LTR7C"     "MER103C"   "MER113"    "MER2"      "MER3"     
"MER44B"    "MER51A"    "MER53"     "MER5A"     "MER91C"    "MIR"       "MIR3"      "MIRb"      "MIRc"      "MLT1C"     "MLT1D"     "MSTA"      "SVA_D"

I have got the names of the significant one and I would like the one which are in cancer and healthy to see if there is one family which has got a potentiel significance to lead to cancer

ADD REPLY • link updated 2.4 years ago by Ram 44k • written 9.6 years ago by yasjas ▴ 70

Ram · Answer 1 · 2015-04-14

In the fisher test, use alternative = 'less' i.e. to determine if the enrichment of probes is less in healthy vs. cancer tissue. I have manipulated your data (row 16) to show how this works. Now, the probe at row 16 is quite enriched in cancer (171/179) vs. healthy (5/179).

> ht[16,]
          healthy cancer   x
(AGGGGG)n       5    171 179

Lets determine it using a fisher test. A p-value < 0.05 will determine if the enrichment is significantly less in healthy vs. cancer:

> ht$pvalues <- apply(ht,1,function(x) fisher.test(rbind(x[1:2],x[3]-x[1:2]), alternative = "less")$p.value)
> ht[16,]
          healthy cancer   x      pvalues
(AGGGGG)n       5    171 179 1.373839e-84

You can also do the reverse, if your first column is cancer & second is healthy, you can use the alternative = 'greater' option. It is essentially the same thing but you are finding if the enrichment is significantly greater in cancer vs. healthy. See below, I have reversed the order of columns in the function & used alternative = 'greater':

ht$pvalues <- apply(ht,1,function(x) fisher.test(rbind(x[c(2,1)],x[3]-x[c(2,1)]), alternative = "greater")$p.value)
ht[16,]
          healthy cancer   x      pvalues
(AGGGGG)n       5    171 179 1.373839e-84