Hi all, I have a SNPsfile (containing 11 millions SNPs) which I was using to create covariance matrix in Bayenv, so each column in this file corresponds populations and rows are SNPs, but for every SNP I have 2 rows (for two alleles), look like below (2 * nsnps "rows" and npops "columns"):
7 2 2 0 6 2 2
1 0 0 0 0 0 0
0 2 2 0 0 0 0
1 0 0 0 0 0 0
So in the example above I have 7 populations (columns) and 2 SNPs (rows). I need to modify the format of this file a bit. In the new file each row should correspond to one SNP and the number of columns should be twice the number of populations because each pair of numbers corresponds to each allele. So the new file should look like this ( nsnps "rows" and 2*npops "columns"):
7 1 2 0 2 0 0 0 6 0 2 0 2 0
0 1 2 0 2 0 0 0 0 0 0 0 0 0
I have Rcodes which do this manipulation job for me, but it seems that R is so slow, I just want to ask can anyone help me to figure out if there is anyway to do it in Perl or Python. I am new to both of them, I would appreciate any help to fix this issue. Thanks
Can you show your R code and tell the size of your matrix and how much RAM you have available? Transposing a matrix should be quick in R, unless your matrix is too big and you are swapping to disk.
It's not exactly transposing.
You are right, it is not near transposing.
Which is great because there is no need to load the entire matrix.
is it me or someone who did not understand the output format. I feel so noob and still, cannot figure out what the OP wanted. Am glad Wouter figured it out but I would be glad if I can understand what the OP is trying to achieve. It will be nice to learn something new. :)
Ah, I understood now the format and what the OP is trying to achieve. Actually, it is not replacing rows to the column to its entirety. If the moderator could help in changing the question else it will be misleading.
I changed it to "restructure", can't think of anything more specific.
yes, it is much better now and a reader will not be misled. Thanks, @Wouter. At least other readers will simply not copy the code rather read the query posted if they need any help with this post.
Interleaving columns by row pairs? Interleaving columns by every two rows?
That's a good one! ;)