Hi, I have a 1000 txt files with two columns: the gene symbol column, and the mutation status column. I want to join all of these files into one file, which will contain first gene symbol column and the following 1000 sample columns of mutation status. For example, I want to join the following input files:
txt file 1:
Gene Sample
A ID1
B ID1
D ID1
txt file 2:
Gene Sample
B ID2
C ID2
E ID2
txt file 3, ... txt file 1000
into the output file
Gene ID1 ID2 ID3 ... ID1000
A yes NA ...
B yes yes ...
C NA yes ...
D yes NA ...
E NA yes ...
...
I know the full_join solution in R using the dplyr package, but it need to read all the files into R. Does anyone has the simple solution in Unix to do this?
Thanks a lot! Xiaoyong
Thank you so much, Pierre. I will appreciate if you can explain me a more detail of the code and how to pipe out using datamash. It will be very helpful for me. Thanks.
Hi Pierre,
I really appreciate your response. I have modified my question to make it more precise. I haven't tried your solution yet, but I am afraid that it may need modified too. Thanks!