awk command merge files on one common column
0
0
Entering edit mode
2.6 years ago

Hi,

I have two files

file 1

A
B
B
C
D
E
E
E

file 2

A P Q 
B R S
C W X
D T U
E G H

I want to map the 1st column of file 1 with 1st column of file2 and I want the result as

A P Q
B R S
B R S
C W X
D T U
E G H
E G H
E G H

result will give me multiple occurrence of the row according to the 1st file.

I have used awk command for this, but it is giving me uniq entry not multiple.

Command used:

awk -F'\t' 'NR==FNR{a[$1];next} ($1) in a' file1 file2

EDIT: OP added biological context:

file1

Chrysiogenetes
Chrysiogenetes
Chrysiogenetes
Coprothermobacterota
Coprothermobacterota
Abditibacteriota
Abditibacteriota
Dictyoglomi
Dictyoglomi

File2

Chrysiogenetes ABCD
Coprothermobacterota WXYZ
Abditibacteriota TUVM
Dictyoglomi  OPLK

output :

Chrysiogenetes ABCD
Chrysiogenetes ABCD
Chrysiogenetes ABCD
Coprothermobacterota WXYZ
Coprothermobacterota WXYZ
Abditibacteriota TUVM
Abditibacteriota  TUVM
Dictyoglomi  OPLK
Dictyoglomi  OPLK

I want to match taxonomy details with my organism name.

Linux shell-scripting awk • 1.3k views
ADD COMMENT
2
Entering edit mode
$ join -t $'\t' -1 1 -2 1 file1.txt file2.txt

Chrysiogenetes  ABCD
Chrysiogenetes  ABCD
Chrysiogenetes  ABCD
Coprothermobacterota    WXYZ
Coprothermobacterota    WXYZ
Abditibacteriota    TUVM
Abditibacteriota    TUVM
Dictyoglomi OPLK
Dictyoglomi OPLK
ADD REPLY
0
Entering edit mode

Hi, thank you for your suggestion.

I tried using this. I have sorted both the files but still I am getting error showing file 2 one entry not sorted .

ADD REPLY
1
Entering edit mode
 join -t $'\t' -1 1 -2 1 <( tr " " "\t" < file1.txt | tr -d '\r' | sort -t $'\t' -k1,1) <( tr " " "\t" < file2.txt | tr -d '\r' |  sort -t $'\t' -k1,1)
ADD REPLY
0
Entering edit mode

If your copy paste operation was accurate, you have a white space in front of Abditibacteriota in file1 but not in file2.

ADD REPLY
0
Entering edit mode

This post does not fit the theme of this forum. For simple awk commands, search StackOverflow.

If there is no biological context, this post will be deleted.

ADD REPLY
0
Entering edit mode

file1

Chrysiogenetes

Chrysiogenetes

Chrysiogenetes

Coprothermobacterota

Coprothermobacterota

Abditibacteriota

Abditibacteriota

Dictyoglomi

Dictyoglomi

File2

Chrysiogenetes ABCD

Coprothermobacterota WXYZ

Abditibacteriota TUVM

Dictyoglomi OPLK

output :

Chrysiogenetes ABCD

Chrysiogenetes ABCD

Chrysiogenetes ABCD

Coprothermobacterota WXYZ

Coprothermobacterota WXYZ

Abditibacteriota TUVM

Abditibacteriota TUVM

Dictyoglomi OPLK

Dictyoglomi OPLK

I want to match taxonomy details with my organism name .

ADD REPLY
1
Entering edit mode

I've moved this content to your question. Next time, please as the question with proper biological context.

Side note: I corrected some typos in your post. The word column is spelled with just one o - it's not coloumn.

ADD REPLY

Login before adding your answer.

Traffic: 2210 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6