Merging same one column different files
2
0
Entering edit mode
4.0 years ago
mel22 ▴ 100

Hello, Please I would like to merge one same column to multiple different files. Those files have the same structure but from different samples and I want to merge them with snp position column each file separately.
Is there any kind of loop (bash, R , python ...) that could do this ?

input files

RS  1-51.Log R Ratio    1-51.B Allele Freq
A28         -0.1656                     1

column :

RS       Position
A28      5555

Output:

RS  1-51.Log R Ratio    1-51.B Allele Freq        Position
A28         -0.1656                     1          5555

Thank you very much

merge bash • 1.0k views
ADD COMMENT
1
Entering edit mode

with tsv-utils :

input:

$ cat first_file.txt second_file.txt 

RS  Position
A28 5555
RS  1-51.Log_R_Ratio    1-51.B_Allele_Freq
A28 -0.1656 1

output:

$ tsv-join -H -f first_file.txt -k RS   --write-all  -1 -a Position second_file.txt

RS  1-51.Log_R_Ratio    1-51.B_Allele_Freq  Position
A28 -0.1656 1   5555
ADD REPLY
0
Entering edit mode

You should provide the first few lines in each file and an example of the desired output. It would be difficult to answer the question without this information.

ADD REPLY
2
Entering edit mode
4.0 years ago

Looks like a task for the merge function in R. See ?merge

ADD COMMENT
1
Entering edit mode

Adding to Carlo's point:

  1. Use list.files to get a list of file names/locations - ths will be the list of files to read.
  2. Use lapply with read.table on the above list to get a list of data.frame objects with the content of each file. You will want to uniquify column names in individual data.frames so merging would not create suffixes on similarly names columns.
  3. Use Reduce with merge to combine the list from step-2 to get a single data frame.
ADD REPLY
0
Entering edit mode

Thanks for elaborating my rather blunt answer ! I just wanted to ad that by default, the merge is performed based on the the columns with identical names, so depending on the case, it might not be needed to uniquify column names.

ADD REPLY
0
Entering edit mode

That is both a pro and a con. When merging multiple output files from RSEM, for example, such a merge would be faulty. IMO a merge should always be done with explicit column name specification.

ADD REPLY
1
Entering edit mode
4.0 years ago
join -t $'\t' -1 1 -2 1 <(sort -t $'\t' -k1,1 file1.tsv) <(sort -t $'\t' -k1,1 file2.tsv)
ADD COMMENT

Login before adding your answer.

Traffic: 1306 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6