Question

expression analysis of miRNAs

0

Entering edit mode

10.1 years ago

adnanjaved1988 ▴ 80

Hey All

I am confused in on part of my analysis .

What I need to do is extract Up-regulated and down-regulated miRNA's from my data frame. I have data frame with 5 Samples A,B,C,D,E. A is parent (reference)sample and rest of samples are from patients. each row represents a miRNA and value against that row in each column represents Back ground subtraction values of that miRNA in each sample. so on the basis of this I want to extract miRNA's which are up regulated and down-regulated in each sample.Since I have no replicates, there really aren't any statistical tests that make sense. So want to divide B, C, D, and E by A. This gives me fold change for each sample with respect to sample A, the parent. then I can filer my rows (where UP will be >1 and DOWN will be less than 1). I am also to do this for two columns and is not able to do that for 5 columns.

My data Look like

                                               A              B              C              D              E
hsa-miR-199a-3p, hsa-miR-199b-3p               NA             13.13892       5.533703       25.67405       NA
hsa-miR-365a-3p, hsa-miR-365b-3p               15.70536       52.86558       18.467540      223.51424      31.93503
hsa-miR-3689a-5p, hsa-miR-3689b-5p             NA             21.41597       5.964772       NA             24.26073
hsa-miR-3689b-3p, hsa-miR-3689c                9.58696        44.56490       10.102051      13.26785       NA  
hsa-miR-4520a-5p, hsa-miR-4520b-5p             18.06865       28.06991       NA             NA             NA
hsa-miR-516b-3p, hsa-miR-516a-3p               NA             10.77471       8.039662       NA             NA

now I want to firstly divide B/C/D/E with A

but I have to take care of these conditions.

if ( B &&C && D && E)==NA ---> result is NA

now I will take B&C (expression of C with respect to B (C/B)

if numerator(C) is NA ---> result=NA

if denominator (B) is NA ---->result=value of C (numerator) <<<- why because when I will compare C with respect to B if miRNA was expressed in B but not expressed in C then result should be NA and if miRNA was not expressed in B but it expressed in C then result should be C (Updated value of that miRNA)

else I will simply divide (C/B) and will store in result . Now result should be divided with D

result/D with same conditions of NA of numerator and denominator and again the result should updated and again should be divided with E with updated value and same NA conditions.

A              B              C              D              E
18.06865       28.06991       NA             441.00         NA

Lest suppose B/C/D/E

B/C ------>result=NA

D/result(updated)=441.00 (Updated)

E/441.00=NA.

now I can divide that result with A ----> result/A======== NA

I would really appreciate your help

Best
Adnan Javed

R • 2.9k views

ADD COMMENT • link updated 3.6 years ago by Ram 44k • written 10.1 years ago by adnanjaved1988 ▴ 80

0

Entering edit mode

Something/NA==NA

Aside from that, it's really unclear what your question is.

ADD REPLY • link updated 3.6 years ago by Ram 44k • written 10.1 years ago by Devon Ryan 104k

0

Entering edit mode

Hey Devon Ryan

Sorry I know its bit confusing or may be how I explained it making you confuse.

Below is the code for two columns. I want to check Up and down regulation of miRNAs. Possibilities are

compare between two samples. or compare between all samples

suppose if A is parent and B is disease sample and u want to see if this miRNA is either up regulated or down regulated in patient if you have no replicate you would do like that

B/A (expression of B with respect to A). but when you have NA values in data you have to deal with different conditions I have to include NA values in data otherwise I would have replaced them with 0 or would have removed them. for that I mentioned different condition on my post while comparing 4 columns at a same time.

this is code for two columns and now I want to compare 4 columns

file = list.files(pattern = ".*.txt")
d = lapply(file,function(x)read.table(x, header=T,sep="\t"))
d<-data.frame(d)
rnames <- as.matrix(d[1:2019,1])
d1<-as.matrix(d[1:2019,c(4,12,20,28,36)])
rownames(d1)<-rnames
d1<-data.frame(d1)

colnames(d1)<-c("A","B","C","D","E")

tem<-d1[,2]
tem<-data.frame(tem)
div<-d1[,1]
div<-data.frame(div)
C<-data.frame(matrix(NA,nrow=2019,ncol=1))
 for(i in 1:nrow(tem))
{
  for(j in 1:ncol(tem))
  {
    if(is.na(tem[i,j]) && is.na(div[i,j]))
    {
      C[i,j]=NA
    }
    else if(is.na(tem[i,j])) 
     #|| is.na(div[i,j]))
    {
      C[i,j]=div[i,j]
    }
      else if(is.na(div[i,j]))
    {
      C[i,j]=tem[i,j]
    }
    else
    {
    C[i,j]<-tem[i,j]/div[i,j]
    }
  }
}
colnames(C)<-c("Regulation")
ab<-cbind(div,tem,C)
colnames(ab)<-c("A","B","res")

I want to do that for 4

B/C/D/E

ADD REPLY • link updated 3.6 years ago by Ram 44k • written 10.1 years ago by adnanjaved1988 ▴ 80

0

Entering edit mode

I Am really not a good programmer :-/ I am getting more confused . What I want to request to u if you can write code when will give me the final result of B/C/D/E in regulation column don't consider A. lets say I have one data frame and I add new column in It result. what I am trying to do is I will divide B and C and D and E and store in result and when I will be done with first row . Forget about previous explanations :) see this may be I am able to tell you.

so my result values after fulfilling the conditions should be like this . for first Row

C/B--->result = 0.4211688 then D/0.4211688 = 25.67405/0.4211688 = 60.95905 and finally E / 60.95905 which will be NA / 60.95905 and final value in result should be NA.

for second row

18.467540/52.86558 = 0.3493301
223.51424 /0.3493301  = 639.8368
31.93503/639.8368 = 0.04991121

similarly for 5th row

                                    A       B        C          D            E
hsa-miR-3689b-3p, hsa-miR-3689c     9.58696 44.56490 10.102051  13.26785     NA


10.102051/ 44.56490 = 0.2266818
13.26785 / 0.2266818 =58.53072
 NA     /58.53072 =    NA


d
                                          A        B         C         D
hsa-miR-199a-3p, hsa-miR-199b-3p         NA 13.13892  5.533703  25.67405
hsa-miR-365a-3p, hsa-miR-365b-3p   15.70536 52.86558 18.467540 223.51424
hsa-miR-3689a-5p, hsa-miR-3689b-5p       NA 21.41597  5.964772        NA
hsa-miR-3689b-3p, hsa-miR-3689c     9.58696 44.56490 10.102051  13.26785
hsa-miR-4520a-5p, hsa-miR-4520b-5p 18.06865 28.06991        NA        NA
hsa-miR-516b-3p, hsa-miR-516a-3p         NA 10.77471  8.039662        NA
                                          E       
hsa-miR-199a-3p, hsa-miR-199b-3p         NA
hsa-miR-365a-3p, hsa-miR-365b-3p   31.93503
hsa-miR-3689a-5p, hsa-miR-3689b-5p 24.26073
hsa-miR-3689b-3p, hsa-miR-3689c          NA
hsa-miR-4520a-5p, hsa-miR-4520b-5p       NA
hsa-miR-516b-3p, hsa-miR-516a-3p         NA

Thank you so much for your help I really appreciate your time :)

ADD REPLY • link updated 3.6 years ago by Ram 44k • written 10.1 years ago by adnanjaved1988 ▴ 80

0

Entering edit mode

why I was writing conditions as I am looking for fold change

If one miRNA in Sample was not expressed but in next sample it expressed then I have to mention its new value

A        B         C         D    E
NA       10.77471  8.039662  NA   6.22


8.039662/10.77471=0.7461604
NA/0.7461604=NA

But now in E it expressed so if I will do the same 6.22/NA the result would be NA which is Not right result should be 6.22 which shows that miRNA expressed in E sample

ADD REPLY • link updated 3.6 years ago by Ram 44k • written 10.1 years ago by adnanjaved1988 ▴ 80

0

Entering edit mode

Ah, so you want some sort of cumulative ratio. I'd have to think of the best way to do that, since it's such an uncommon thing to want to do. I suppose one could apply() a function to subset you initial matrix into a list of submatrices and then lapply() a function to just apply() the cumulative ratio to the rows using a for loop. You might just give that a try.

ADD REPLY • link updated 3.6 years ago by Ram 44k • written 10.1 years ago by Devon Ryan 104k

Ram · Answer 1 · 2014-10-27

OK, so I'll restate your problem in a single sentence: "In R when computing the ratio between values in a dataframe and a vector, is there a way to replace resulting NA values with either the vector or dataframe values when one of the latter is not NA?"

This, then, becomes a simple data processing problem. Let us suppose that your values are in a dataframe named d:

> d
                                          A        B         C         D
hsa-miR-199a-3p, hsa-miR-199b-3p         NA 13.13892  5.533703  25.67405
hsa-miR-365a-3p, hsa-miR-365b-3p   15.70536 52.86558 18.467540 223.51424
hsa-miR-3689a-5p, hsa-miR-3689b-5p       NA 21.41597  5.964772        NA
hsa-miR-3689b-3p, hsa-miR-3689c     9.58696 44.56490 10.102051  13.26785
hsa-miR-4520a-5p, hsa-miR-4520b-5p 18.06865 28.06991        NA        NA
hsa-miR-516b-3p, hsa-miR-516a-3p         NA 10.77471  8.039662        NA
                                          E
hsa-miR-199a-3p, hsa-miR-199b-3p         NA
hsa-miR-365a-3p, hsa-miR-365b-3p   31.93503
hsa-miR-3689a-5p, hsa-miR-3689b-5p 24.26073
hsa-miR-3689b-3p, hsa-miR-3689c          NA
hsa-miR-4520a-5p, hsa-miR-4520b-5p       NA
hsa-miR-516b-3p, hsa-miR-516a-3p         NA

So we could simply do the following:

l <- lapply(c(1:5), function(x) as.matrix(d[,-x]/d[,x])) #There has to be nicer way to do this!
l2 <- mapply(function(x, y) {x[is.na(x)] <- as.matrix(d[,-y])[is.na(x)]; x}, l, c(1:5), SIMPLIFY=F)
l3 <- mapply(function(x, y) {x[is.na(x)] <- rep(d[,y], ncol(d[,-y]))[is.na(x)]; x}, l2, c(1:5), SIMPLIFY=F)

I kept the various steps of creating the lists (l, l2, and l3) so you can follow along. I've not heavily tested that.