Can someone help me with searching overlapping values between two files?
2
0
Entering edit mode
21 months ago
mxm189 • 0

I have two files, one file (file1.csv) contains a single column with 1200 values, 11-digits long, For instance: 00000001111 00000001152 etc.

Another file (file2.csv) contains 8 columns and consists of several tabs, in the third column there are the values that can also be found in 'file1.csv'. If there is an overlapping value I would like to print the entire row and put this into a new file (file3, or output.csv). Can someone help me with writing a script or a single command to solve this problem?

I tried google, to look if someone had a similar problem, I found this bash script which was kind of close to my problem so I gave ChatGPT this as a template. But, ChatGPT is not helping at all. Please, I come seeking guidance, can someone help me out? I have been trying stuff for several days but yeah not making any progress.

bash grep awk MATCH • 1.2k views
ADD COMMENT
0
Entering edit mode

Thank you, I'll take a look at both of these :)

ADD REPLY
0
Entering edit mode

I don't see any relevance to bioinformatics in this thread.

ADD REPLY
2
0
Entering edit mode

I tried some stuff but it didnt work. I used ChatGPT, because I'm not good at this. I tried sorting of the two files first

Sort the input files by the tokumei column

sort -t ',' -k1,1 file1.csv > sorted1.csv
sort -t ',' -k1,1 Database_cleaned.csv > sorted2.csv

And then joining them.

Join the two CSV files on the "tokumei" column

join -t ',' -1 1 -2 1 sorted1.csv sorted2.csv > merged.csv

For file1 sorting gave me a clean new sorted file but for file2 out of the 30000 values the first 700 return N/A. Also the join function created merged.csv file was completely blank. Im sorry for reacting so late but the software that we normally use for this was supposedly fixed until this monday I realized it was broken. Please if you could help I would really appreciate it. Thanks

ADD REPLY
1
Entering edit mode
ADD COMMENT

Login before adding your answer.

Traffic: 2734 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6