compare two columns of two files
3
0
Entering edit mode
5.6 years ago
Sam ▴ 150

Dear All

I have two files of gene list in tab format, I want to compare 2nd column if 1st file file with 1st column of 2nd file and report all columns in both files for each match

any help ?

Thanks

file 1 
1st column        2nd column
chr10_17026 VIT_202s0033g00190
chr18_37402 VIT_210s0003g03400
chr19_39195 VIT_200s0313g00080


files 2
1st column                                                  2nd column 3rd column
VIT_200s0313g00072 VIT_200s0313g00080 VIT_200s0313g00066    7.86            6.22
VIT_202s0033g00190  8.48        6.49
VIT_210s0003g03400  8.18        6.05
VIT_210s0116g01020  8.67        5.82
VIT_213s0064g01330  9.02        5.84

output :
chr10_17026 VIT_202s0033g00190     8.48        6.49
chr18_37402 VIT_210s0003g03400     8.18        6.05
chr19_39195 VIT_200s0313g00080     7.86         6.22
awk bash • 1.6k views
ADD COMMENT
0
Entering edit mode

Hello Sam ,

what have you tried so far?

fin swimmer

ADD REPLY
1
Entering edit mode

another tip is to use sort and join commands.

ADD REPLY
0
Entering edit mode

One tip, use R. Import the files into R and use subsetting. There are many tutorials, for example here, so try to do it yourself.

ADD REPLY
2
Entering edit mode
5.6 years ago

The most simple way to do it with awk is like this:

awk 'FNR==NR {lookup[$2]; next} {if ($1 in lookup) print}' file1.tsv file2.tsv 
VIT_202s0033g00190  8.48    6.49
VIT_210s0003g03400  8.18    6.05

You may want to tidy up this type of line, though, as its formatting is inconsistent:

VIT_200s0313g00072 VIT_200s0313g00080 VIT_200s0313g00066    7.86            6.22
ADD COMMENT
1
Entering edit mode
5.6 years ago
bioguy24 ▴ 230

Another awk:

awk 'NR==FNR{c[$2]++;next};c[$1] > 0' file1 file2
ADD COMMENT
1
Entering edit mode
5.6 years ago
curious ▴ 810

If you know python you can look at the pandas package. You probably want to perform an inner merge. Learning SQL-like processes can take a long time to get used to, but after a while become second nature. It might not help you now depending on your situation, but you eventually might want to look into pandas. It is my daily driver for working with tab separated data like this. It is incredibly powerful

https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html

ADD COMMENT

Login before adding your answer.

Traffic: 2716 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6