I have a data set and I want to sort it in the following way in R. I hope I can explain clearly.
1- Sort by the elements seen in the main column (focal SNP). This will give us two chunks, one chunk with all As and one chunk with all Gs.
2- Then for the first chunk, move to the -1 column position, and sort by the elements seen there (there are two elements, C/T). This will break the first chunk into two smaller chunks, one with A at the main column and C at the - 1st column; and one chunk with A at the main column and T at the - 1st column.
3- For the second chunk, move to the -1 column and do the same. I will end up with two smaller chunks, one with G at the main column and C at the - 1st column; and one with G at the main column and T at the -1th column.
4- Move to the +1 column and do the same. At each step, I will end up partitioning each of the existing chunks into two new chunks.
Actually, column names are positions(bp) in my data and the rows are haplotypes.
I do not want to break the row pattern. I want to sort the rows (swap the arrangement of the rows), but I won't re-arrange the columns. How can I do that?
An idea: I did this sorting by hand and I got a normal distribution shape. That's why I gave weights (for every column) which were obtained by normal distribution function. After that I got a weighted covariance matrix (number of rows x number of rows) by using the dissimilarity coefficient between rows and weights. Then I ranked the data by using eigenvectors of correlation matrix which has the penalty for missing data. However I could not reach the result that I reached by hand. My data is so big but I am sharing a small part of it.
-7 -6 -5 -4 -3 -2 -1 Main 1 2 3 4
A C C A C C T A G A T G
A C C A C C T A G A T G
A C C A C C T A G A T G
A T C G C T C G G G T G
A C C A C C T A G A T G
G C T G C T T G G G T G
A C C A C C T G G A T G
G C T G C T T G G G T G
A C C A C C T G G A T G
A C C A C C T G G A T G
A C C A C C T A G A T G
A C C A C C T A G A T G
A C C A C C T A G A T G
A C C A C C T A G A T G
A C C A C C T A G A T G
A C C A C C T A G A T G
A C C A C C T A G A T G
A C C A C C T A G A T G
A C C A C C T A G A T G
A C C A C C T A G A T G
A C C A C C T A G A T G
A C C A C C T G G G T G
A C C A T C T A G A T G
A C C A C C T A G A T G
A C C A C C T A G A T G
A C C G C T T G A G C T
A C C A C C T A G A T G
A C C A C C T A G A T G
A C C A C C T A G A T G
A C C A C C T A G A T G
A C C A C C T A G A T G
A C C A C C T A G A T G
A C C A C C T A G A T G
A C C A C C T A G A T G
This will order the dataframe by "Main" and "-1" (minus1). You probably should not use numbers as headers.
dat[order(dat$Main,dat$minus1),]
where dat is your full data frame
Thank you! Unfornutaly, it doesn't give what I want. I guess it is more complicated than that.
see if this works: test.txt is text in OP with tab separated values
or
output: