I am trying to compare two (or more) files, containing chromosomal positions in the form 2:282828282828, there are about 70,000 of these positions, and whilst Excel works for smaller datasets, it is causing me too much trouble on this scale.
What I was trying to do is
1) compare each position in the file with every position in the other file, and if the value is unique, print that value to a new file
OR If that's not possible, I can merge the two files in to one column so that I can:
2) compare all cells in a merged file, and print the unique values (not their index) in to a different file.
I am a little stuck as to where to start for the first option
But for the second option I am using MATLAB (as I am not sure where to even start with R) and I was thinking of this code:
a=data1.txt
[n, bin] = histc(a, unique(a)); multiple = find(n > 1);
to find the multiple values, but how do I get MATLAB to write these unique values where n=1 to a new file?
I really appreciate any help you may be able to give. I am trying to learn MATLAB and R and Plink at the same time as actually doing the work.
Is it possible to make it save the output in to a file? It works well, but just prints to screen.
I think I am mixing this with plink, but I wrote this
-write-txt >file3
after the command in an attempt to write to file.Just
> file3
Thanks