True/False Positives/Negatives in R
1
0
Entering edit mode
9.7 years ago
newscient ▴ 20

I have made a database of genes from different organisms found as regulated by a specific protein complex through Chip-Seq data analysis.I also have a positive control for all these organisms(genes that have been found to be regulated by the complex) and a negative control for all the organisms (genes that based on a condition have nothing to do with the complex). I want to test the reliability of my analysis by finding the 4 fundamental numbers (TP,FP,TN,FN) and the rates-ratios (False positive rate etc).

With my "weak" testing statistics I thought something like that (please check the comments on my code so that you can understand) :

#make an empty matrix
mat2 <- matrix(NA, nrow = nrow(samples.annotationsnew), ncol =4,dimnames=list(samples.annotationsnew$SampleNo,c("TP","FP","TN","FN")))
#find the intersection between my database of genes and the positive control -> True positive
#the rest of the positive control will be the False Negative
for (i in (1:nrow(samples.annotationsnew))){
  for (j in (1:length(pos_ctrls))){
    if (samples.annotationsnew$ensembl.org[i]==names(pos_ctrls[j]) ){
      mat2[i,1] <- length(intersect(genes2peaksnew[[i]]$feature,pos_ctrls[[j]][,3]))
      mat2[i,4] <- length(pos_ctrls[[j]][,3]) - length(intersect(genes2peaksnew[[i]]$feature,pos_ctrls[[j]][,3]))
    }}}
#find the intersection between my database of genes and the negative control -> False Positive
# the rest of the negative control will be the True Negative
for (i in (1:nrow(samples.annotationsnew))){
  for (j in (1:length(neg_ctrls))){
    if (samples.annotationsnew$ensembl.org[i]==names(neg_ctrls[j]) ){
      mat2[i,2] <- length(intersect(genes2peaksnew[[i]]$feature,neg_ctrls[[j]][,9]))
      mat2[i,3] <- length(neg_ctrls[[j]][,9]) - length(intersect(genes2peaksnew[[i]]$feature,neg_ctrls[[j]][,9]))
    }}}

Is it realistic what I am doing or has nothing to do with True/False Postive/Negative condition testing?

Thanks in advance

condition-testing ChIP-Seq R • 6.6k views
ADD COMMENT
2
Entering edit mode
9.7 years ago

I'll assume that you have the following character vectors:

  • pos_ctrls: known positive controls
  • neg_ctrls: known negative controls
  • pos_exp: genes regulated by the complex according to your experiment
  • neg_exp: genes not regulated by the complex according to your experiment
TP = sum(pos_ctrls %in% pos_exp)
FP = sum(neg_ctrls %in% pos_exp)
TN = sum(neg_ctrls %in% neg_exp)
FN = sum(pos_ctrls %in% neg_exp)

You can adapt this to the data structures you actually have. Note that the sum() works due to TRUE being treated as 1.

ADD COMMENT
0
Entering edit mode

Thanks for your quick answer,

As a neg_exp dataset (genes not regulated by the complex according to my experiment) could be the genes that i discarded during the Chip-seq data analysis by filtering, right?

ADD REPLY
2
Entering edit mode

Coming up with negatives is always difficult. The genes you are referring to, are these the ones that "genes that based on a condition have nothing to do with the complex" as you state? If so, it should be fine. The most important part is always writing up exactly what you consider your positive and negative set.

ADD REPLY
1
Entering edit mode

Correct, neg_exp will have to be those discarded or without peaks in your dataset. Note that your control dataset really needs to match the Chip-seq experimental conditions as closely as possible. Any biological change would make the TP/TN/FP/FN metrics meaningless.

ADD REPLY
0
Entering edit mode

Yes your point about the control dataset is totally correct! But the genes "without peaks" will be the rest of the organism's genes which i also find meaningless, so i will stick with the discarded ones! Thanks again

ADD REPLY

Login before adding your answer.

Traffic: 2542 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6