R - Pairwise Combinations Of Rows From Table?
3
1
Entering edit mode
14.0 years ago

Hi,

Assume a table as below:

X = 

        col1    col2    col3
row1    "A"      "0"     "1"
row2    "B"      "2"     "NA"
row3    "C"      "1"     "2"

I select combinations of two rows, using the code below:

pair <- apply(X, 2, combn, m=2)

This returns a matrix of the form:

pair = 

     [,1] [,2] [,3]
[1,] "A"  "0"  "1" 
[2,] "B"  "2"  NA  
[3,] "A"  "0"  "1" 
[4,] "C"  "1"  "2" 
[5,] "B"  "2"  NA  
[6,] "C"  "1"  "2"

I wish to iterate over pair, taking two rows at a time, i.e. first isolate [1,] and [2,], then [3,] and [4,] and finally, [5,] and [6,]. These rows will then be passed as arguments to regression models, i.e. lm(Y ~ row[i]*row[j]).

I am dealing with a large dataset. Can anybody advise how to iterate over a matrix two rows at a time, assign those rows to variables and pass as arguments to a function?

Thanks,
S ;-)

Edit: In response to the comments, I should specify that my problem concerns SNP and expression data where I aim to do a pairwise multiple regression analysis (first order regression) in order to assess any possible SNP-SNP interactions that may effect the expression phenotype.

r • 23k views
ADD COMMENT
1
Entering edit mode

I disagree, but being a newbie, I will accept your ruling.

ADD REPLY
1
Entering edit mode

ok, I re-opened the question

ADD REPLY
1
Entering edit mode

I suggest to be rather permissive with border-line off-topic questions. What Biostars needs are more good questions. If we close too aggressively, than in particular new people might be discouraged from even asking.

ADD REPLY
1
Entering edit mode

I think generic methods questions are fine, provided that they are linked to a problem in bioinformatics which is clearly stated (as was added in the edit to the question in this case).

ADD REPLY
0
Entering edit mode

sorry, this question was off-topic to me. May be http://stats.stackexchange.com/ would be a better place to ask this.

ADD REPLY
0
Entering edit mode

This is only mildly off-topic, and I don't think this kind of question should be closed. I would guess this question is directly related to a bioinformatics problem being addressed by the poster.

ADD REPLY
0
Entering edit mode

We shouldn't be rather permissive with border-line on-topic questions. What Biostars needs are more good questions. If we close too aggressively, than in particular new people might be discouraged from even asking.

ADD REPLY
4
Entering edit mode
14.0 years ago

Using mapply you can map a function over more than one sequence (list or vector) in one pass e.g. the odd and even rows of a matrix

fn <- function(i, j) paste("i =", i, "j =", j)

odds <- seq(1, 9, by = 2)
evens <- odds + 1

mapply(fn, odds, evens)
ADD COMMENT
1
Entering edit mode

+1 for the apply instead of loop approach

ADD REPLY
1
Entering edit mode

David is right. However, given that Seafoid had gone down that route already, I assumed the dataset was tractable. If the number of combinations were huge, I'd pull them from a generator function in the style of a list comprehension, rather than make them up front.

ADD REPLY
0
Entering edit mode

I agree that apply is in general a great approach. However, the method suggested by Seafoid and Keith of pre-creating all combinations with massively redundant numbers of rows might not scale to genome-scale numbers of SNPs if you're doing all combinations (say, N=500,000 or more).

ADD REPLY
0
Entering edit mode

Yes, I applied some stringent filters on my data and reduced from ~700, 000 to a mere 1,800. This is a first pass analysis so in time the filter stringency will be more lenient and thus the number of SNPs is envisaged to be much larger.

ADD REPLY
2
Entering edit mode
14.0 years ago

A simple idiom for "all pairs of a matrix" M of dim(N, C)

for( i in 1:(N-1) ){
  for( j in (i+1):N ){
    print(paste(i,j))
  }
}

I've had to do this eQTL analysis. For any non-trivial number of genotypes or expression measurements, the naive approach (e.g. building a new lm for each comparison) is too slow. If PHENO is a matrix (not data.frame) of expression values and GENO the set of genotypes for a single locus at all of the samples in PHENO, you want something like

for each GENO
  model = lm( t(PHENO)~GENO )

Extend the linear model to match whatever interactions you want to test. You can extract the p-values of coefficients of a solved linear model with

calculate.lm.pval=function(linear.model){
    # Given a solved linear model, return the p-values of the coefficients
    n.obs = dim(linear.model$residuals)[1]
    n.models = dim(linear.model$coefficients)[2]
    cc = matrix(coef(linear.model), nrow=linear.model$rank, ncol=n.models)
    se = calculate.linear.model.se(linear.model)
    t.stat = cc/se
    rdf = n.obs - linear.model$rank
    dm=data.frame( 2*pt(abs(t.stat), rdf, lower.tail = FALSE)  )
    names(dm) = as.vector(dimnames(linear.model$coefficients)[[2]])
    rownames(dm) = dimnames(linear.model$coefficients)[[1]]
    dm
}
ADD COMMENT
0
Entering edit mode
14.0 years ago

If you want to analyze your SNP-SNP interaction in genomic scale, I think the combinations would be too huge for your computer to fulfill. Luckily, there is now a standard package named MDR, with which your can get the most possible SNP-SNP interaction affecting the phenotype. You can download MDR from here http://sourceforge.net/projects/mdr/, and further references about MDR are redundant on internet.

ADD COMMENT

Login before adding your answer.

Traffic: 1832 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6