Question

True/False Positives/Negatives in R

0

Entering edit mode

10.1 years ago

newscient ▴ 20

I have made a database of genes from different organisms found as regulated by a specific protein complex through Chip-Seq data analysis.I also have a positive control for all these organisms(genes that have been found to be regulated by the complex) and a negative control for all the organisms (genes that based on a condition have nothing to do with the complex). I want to test the reliability of my analysis by finding the 4 fundamental numbers (TP,FP,TN,FN) and the rates-ratios (False positive rate etc).

With my "weak" testing statistics I thought something like that (please check the comments on my code so that you can understand) :

#make an empty matrix
mat2 <- matrix(NA, nrow = nrow(samples.annotationsnew), ncol =4,dimnames=list(samples.annotationsnew$SampleNo,c("TP","FP","TN","FN")))
#find the intersection between my database of genes and the positive control -> True positive
#the rest of the positive control will be the False Negative
for (i in (1:nrow(samples.annotationsnew))){
  for (j in (1:length(pos_ctrls))){
    if (samples.annotationsnew$ensembl.org[i]==names(pos_ctrls[j]) ){
      mat2[i,1] <- length(intersect(genes2peaksnew[[i]]$feature,pos_ctrls[[j]][,3]))
      mat2[i,4] <- length(pos_ctrls[[j]][,3]) - length(intersect(genes2peaksnew[[i]]$feature,pos_ctrls[[j]][,3]))
    }}}
#find the intersection between my database of genes and the negative control -> False Positive
# the rest of the negative control will be the True Negative
for (i in (1:nrow(samples.annotationsnew))){
  for (j in (1:length(neg_ctrls))){
    if (samples.annotationsnew$ensembl.org[i]==names(neg_ctrls[j]) ){
      mat2[i,2] <- length(intersect(genes2peaksnew[[i]]$feature,neg_ctrls[[j]][,9]))
      mat2[i,3] <- length(neg_ctrls[[j]][,9]) - length(intersect(genes2peaksnew[[i]]$feature,neg_ctrls[[j]][,9]))
    }}}

Is it realistic what I am doing or has nothing to do with True/False Postive/Negative condition testing?

Thanks in advance

condition-testing ChIP-Seq R • 6.7k views

ADD COMMENT • link updated 2.9 years ago by Ram 45k • written 10.1 years ago by newscient ▴ 20

Ram · Accepted Answer · 2015-03-16

2

Entering edit mode

10.1 years ago

Devon Ryan 105k

I'll assume that you have the following character vectors:

pos_ctrls: known positive controls
neg_ctrls: known negative controls
pos_exp: genes regulated by the complex according to your experiment
neg_exp: genes not regulated by the complex according to your experiment

TP = sum(pos_ctrls %in% pos_exp)
FP = sum(neg_ctrls %in% pos_exp)
TN = sum(neg_ctrls %in% neg_exp)
FN = sum(pos_ctrls %in% neg_exp)

You can adapt this to the data structures you actually have. Note that the sum() works due to TRUE being treated as 1.

ADD COMMENT • link 10.1 years ago by Devon Ryan 105k

0

Entering edit mode

Thanks for your quick answer,

As a neg_exp dataset (genes not regulated by the complex according to my experiment) could be the genes that i discarded during the Chip-seq data analysis by filtering, right?

ADD REPLY • link 10.1 years ago by newscient ▴ 20

2

Entering edit mode

Coming up with negatives is always difficult. The genes you are referring to, are these the ones that "genes that based on a condition have nothing to do with the complex" as you state? If so, it should be fine. The most important part is always writing up exactly what you consider your positive and negative set.

ADD REPLY • link updated 2.9 years ago by Ram 45k • written 10.1 years ago by David Westergaard ★ 1.5k

1

Entering edit mode

Correct, neg_exp will have to be those discarded or without peaks in your dataset. Note that your control dataset really needs to match the Chip-seq experimental conditions as closely as possible. Any biological change would make the TP/TN/FP/FN metrics meaningless.

ADD REPLY • link updated 2.9 years ago by Ram 45k • written 10.1 years ago by Devon Ryan 105k

0

Entering edit mode

Yes your point about the control dataset is totally correct! But the genes "without peaks" will be the rest of the organism's genes which i also find meaningless, so i will stick with the discarded ones! Thanks again

ADD REPLY • link 10.1 years ago by newscient ▴ 20