awk command for printing all the repeated matching lines without making them unique
2
0
Entering edit mode
3.0 years ago

I have two files file1 file2 which has taxonomy details .

for example

file1 : ( it has taxonomy ID - some digit)

9

9

4

4

4

file2 : ( it has other taxonomy details along with taxonomy ID )

9 A B C D

4 P Q R S

I want to get an output like output :

9 A B C D

9 A B C D

4 P Q R S

4 P Q R S

4 P Q R S

I tried using this command

awk -F '\t' 'NR==FNR{a[$1];next} ($1) in a' file1 file2

awk • 1.3k views
ADD COMMENT
1
Entering edit mode
$ cat test1.txt | xargs -i sed -n '/{}/p' test2.txt

9   A B C D
9   A B C D
4   P Q R S
4   P Q R S
4   P Q R S

$ parallel sed -n /{}/p test2.txt :::: test1.txt 
9   A B C D
9   A B C D
4   P Q R S
4   P Q R S
4   P Q R S
ADD REPLY
0
Entering edit mode

Why cat into sed?

ADD REPLY
0
Entering edit mode

how is it related to bioinformatics ?

ADD REPLY
0
Entering edit mode

I have two taxonomy data files , I am trying to map them with their taxID . and want to get all the repeated matched taxIDs along with other details.

ADD REPLY
1
Entering edit mode
3.0 years ago
Dunois ★ 2.8k

You don't need awk for this.

Following the data you shared here, just sort file1 and file2, and use join like so:

$ join -1 1 -2 1 <(sort file1) <(sort file2)
4 P Q R S
4 P Q R S
4 P Q R S
9 A B C D
9 A B C D
ADD COMMENT
0
Entering edit mode
3.0 years ago
join -t $'\t' -1 1 -2 1 <(sort -t $'\t' -k1,1 file1) <(sort -t $'\t' -k1,1 file2)
ADD COMMENT

Login before adding your answer.

Traffic: 4499 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6