I have two files with with two columns the. File 1 CBS11016_00000185-RA h. CBS11016_00000186-RA h. CBS11016_00000187-RA h. CBS11016_00000188-RA h. CBS11016_00000189-RA h. CBS11016_00000190-RA h. CBS11016_00000191-RA h. CBS11016_00000192-RA h. CBS11016_00000193-RA h. CBS11016_00000194-RA h. CBS11016_00000195-RA h. CBS11016_00000196-RA h. CBS11016_00000197-RA h. CBS11016_00000198-RA h. CBS11016_00000199-RA h. CBS11016_00000200-RA h. CBS11016_00000201-RA h. CBS11016_00000202-RA h. CBS11016_00000203-RA h. CBS11016_00000204-RA h. CBS11016_00000205-RA h. CBS11016_00000206-RA h. CBS11016_00000207-RA h. CBS11016_00000208-RA h. CBS11016_00000209-RA h. CBS11016_00000210-RA h. CBS11016_00000211-RA h. CBS11016_00000212-RA h. CBS11016_00000213-RA h. CBS11016_00000214-RA h. CBS11016_00000215-RA h. CBS11016_00000216-RA h. CBS11016_00000217-RA h. CBS11016_00000218-RA h. CBS11016_00000219-RA h. CBS11016_00000220-RA h. CBS11016_00000221-RA h. CBS11016_00000222-RA h. CBS11016_00000223-RA h. CBS11016_00000224-RA h.
File 2
CBS11016_00000185-RA Dip5p CBS11016_00000194-RA hypothetical protein CBS11016_00000189-RA hypothetical protein CBS11016_00000215-RA Threonine--tRNA ligase CBS11016_00000199-RA NAD-dependent protein deacylase CBS11016_00000190-RA Cellulase domain-containing protein CBS11016_00000223-RA 40S ribosomal protein S16 CBS11016_00000224-RA hypothetical protein CBS11016_00000225-RA hypothetical protein CBS11016_00000226-RA hypothetical protein CBS11016_00000230-RA 40S ribosomal protein S1 CBS11016_00000231-RA hypothetical protein CBS11016_00000232-RA 60S ribosomal protein L6 CBS11016_00000233-RA FAS1 domain-containing protein CaO19.3004 CBS11016_00000235-RA hypothetical protein CBS11016_00000239-RA Octanoyltransferase CBS11016_00000245-RA hypothetical protein CBS11016_00000247-RA SsDNA-dependent ATPase CBS11016_00000248-RA MRP-S28 domain-containing protein CBS11016_00000249-RA hypothetical protein
I want a file that can use the column one of file 1 as a reference and match it with the equivalence in column 2 of file 2 I had used awk 'NR==FNR{a[$1]=$2;next} {print $0,a[$1]}' file1 flie2 > newfile
But it rather would combine the column 2 and shorten the length of the strings in file 2. Can any help with a better expression?
Please format this example data using
code
button. In present format it is illegible. You can only provide a couple of rows.Thank you!
Cross-posted: https://stackoverflow.com/questions/62262119
Check
merge
function in R.try
sqldf
in R.Take a look at the commandline tools
sort
andjoin
.Check this post (in R)
https://stackoverflow.com/questions/1299871/how-to-join-merge-data-frames-inner-outer-left-right