Join command for the issue
2
1
Entering edit mode
2.9 years ago

I have a file1 like this

8       Chrysiogenetes
12      Coprothermobacterota
13      Abditibacteriota
13      Dictyoglomi
36      Rhodothermaeota

I have file2 like this

Chrysiogenetes
Chrysiogenetes
Chrysiogenetes
Coprothermobacterota
Coprothermobacterota
Abditibacteriota
Abditibacteriota
Dictyoglomi
Dictyoglomi
Rhodothermaeota
Rhodothermaeota

My expected output:

Chrysiogenetes 8
Chrysiogenetes 8
Chrysiogenetes 8
Coprothermobacterota 12
Coprothermobacterota 12 
Abditibacteriota 13
Abditibacteriota  13
Dictyoglomi  13
Dictyoglomi  13
Rhodothermaeota  36
Rhodothermaeota  36

I am using this join command:

join -t $'\t' -1 2 -2 1 <sorted file1.txt>  <sorted file2.txt >

but I am not getting expected result because both Abditibacteriota and Dictyoglomi both has "13" so join command is not working properly.

Please help

join linux • 1.5k views
ADD COMMENT
1
Entering edit mode

what is the issue and what command you have used to join files?

ADD REPLY
0
Entering edit mode

I have a file1 like this

8       Chrysiogenetes
12      Coprothermobacterota
13      Abditibacteriota
13      Dictyoglomi
36      Rhodothermaeota

I have file2 like this

Chrysiogenetes
Chrysiogenetes
Chrysiogenetes
Coprothermobacterota
Coprothermobacterota
Abditibacteriota
Abditibacteriota
Dictyoglomi
Dictyoglomi
Rhodothermaeota
Rhodothermaeota

My expected output:


Chrysiogenetes 8
Chrysiogenetes 8
Chrysiogenetes 8
Coprothermobacterota 12
Coprothermobacterota 12 
Abditibacteriota 13
Abditibacteriota  13
Dictyoglomi  13
Dictyoglomi  13
Rhodothermaeota  36
Rhodothermaeota  36

I am using this join command:

join -t $'\t' -1 2 -2 1 <sorted file1.txt>  <sorted file2.txt >

but I am not getting expected result because both Abditibacteriota and Dictyoglomi both has "13" so join command is not working properly.

ADD REPLY
0
Entering edit mode
$ join -t $'\t' -1 1 -2 2 test2.txt test1.txt

Chrysiogenetes  8
Chrysiogenetes  8
Chrysiogenetes  8
Coprothermobacterota    12
Coprothermobacterota    12
Abditibacteriota    13
Abditibacteriota    13
Dictyoglomi 13
Dictyoglomi 13
Rhodothermaeota 36
Rhodothermaeota 36
ADD REPLY
2
Entering edit mode
2.9 years ago
JC 13k

It could be better to use grep:

for N in $(cat file2.txt); do grep $N file1.txt | perl -lane 'print "$F[1] $F[0]"' >> out.txt; done
cat out.txt
Chrysiogenetes 8
Chrysiogenetes 8
Chrysiogenetes 8
Coprothermobacterota 12
Coprothermobacterota 12
Abditibacteriota 13
Abditibacteriota 13
Dictyoglomi 13
Dictyoglomi 13
Rhodothermaeota 36
Rhodothermaeota 36
ADD COMMENT
0
Entering edit mode
$ while read line; do grep $line test1.txt; done < test2.txt | awk -v OFS="\t" '{print $2,$1}'

Chrysiogenetes  8
Chrysiogenetes  8
Chrysiogenetes  8
Coprothermobacterota    12
Coprothermobacterota    12
Abditibacteriota    13
Abditibacteriota    13
Dictyoglomi 13
Dictyoglomi 13
Rhodothermaeota 36
Rhodothermaeota 36
ADD REPLY
0
Entering edit mode
awk -F '\t' 'NR==FNR{a[$1];next} ($2) in a' 
awk 'NR==FNR{a[$2]=$1;next}a[$1]{print $0"\t"a[$1]}'

Hi, pls tell me how can I do the same thing using which of this above two commands and if it can be done by modifying any of these commands

ADD REPLY
0
Entering edit mode
3 months ago

Just thought I'd share a more modern and efficient method for joining two CSV files via a common identifier is csvtk - https://github.com/shenwei356/csvtk?tab=readme-ov-file

https://bioinf.shenwei.me/csvtk/usage/#join

I found it a lot easier than trying to use linux sort and join on complex headers.

ADD COMMENT

Login before adding your answer.

Traffic: 1890 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6