Sort the input files by the tokumei column

Question

Can someone help me with searching overlapping values between two files?

0

Entering edit mode

21 months ago

mxm189 • 0

I have two files, one file (file1.csv) contains a single column with 1200 values, 11-digits long, For instance: 00000001111 00000001152 etc.

Another file (file2.csv) contains 8 columns and consists of several tabs, in the third column there are the values that can also be found in 'file1.csv'. If there is an overlapping value I would like to print the entire row and put this into a new file (file3, or output.csv). Can someone help me with writing a script or a single command to solve this problem?

I tried google, to look if someone had a similar problem, I found this bash script which was kind of close to my problem so I gave ChatGPT this as a template. But, ChatGPT is not helping at all. Please, I come seeking guidance, can someone help me out? I have been trying stuff for several days but yeah not making any progress.

bash grep awk MATCH • 1.2k views

ADD COMMENT • link updated 20 months ago by Pierre Lindenbaum 164k • written 21 months ago by mxm189 • 0

0

Entering edit mode

Thank you, I'll take a look at both of these :)

ADD REPLY • link 21 months ago by mxm189 • 0

0

Entering edit mode

I don't see any relevance to bioinformatics in this thread.

ADD REPLY • link 21 months ago by Mensur Dlakic ★ 28k

Pierre Lindenbaum · Answer 1 · 2023-02-24

2

Entering edit mode

21 months ago

mohammadhassanj ▴ 260

https://stackoverflow.com/questions/25875368/join-two-csv-files-with-key-value

ADD COMMENT • link 21 months ago by mohammadhassanj ▴ 260

0

Entering edit mode

I tried some stuff but it didnt work. I used ChatGPT, because I'm not good at this. I tried sorting of the two files first

Sort the input files by the tokumei column

sort -t ',' -k1,1 file1.csv > sorted1.csv
sort -t ',' -k1,1 Database_cleaned.csv > sorted2.csv

And then joining them.

Join the two CSV files on the "tokumei" column

join -t ',' -1 1 -2 1 sorted1.csv sorted2.csv > merged.csv

For file1 sorting gave me a clean new sorted file but for file2 out of the 30000 values the first 700 return N/A. Also the join function created merged.csv file was completely blank. Im sorry for reacting so late but the software that we normally use for this was supposedly fixed until this monday I realized it was broken. Please if you could help I would really appreciate it. Thanks

ADD REPLY • link updated 20 months ago by Pierre Lindenbaum 164k • written 20 months ago by mxm189 • 0

score 1 · Answer 2 · 2023-02-24

1

Entering edit mode

21 months ago

Pierre Lindenbaum 164k

https://linux.die.net/man/1/join

ADD COMMENT • link 21 months ago by Pierre Lindenbaum 164k