Does anyone know how to compare pairs of key/value in two hashtables with a third file ? I'm currently working with three tab delimited files. The first two files contains the list of proteins with their pfam domain ids as information and the third file contains the domain-domain interactions. I need to compare all the files and identify the protein pairs if domains in one protein interacted with all of the corresponding domains of the other protein. Input files looks like :
Input file 1
XP_002372137.1 PF00754
XP_002372137.1 PF09118
XP_002372140.1 PF00202
XP_002372145.1 PF03747
Input file 2
XP_002372172.1 PF03446
XP_002372172.1 PF14833
XP_002372174.1 PF05378
XP_002372174.1 PF01968
XP_002372174.1 PF02538
XP_002372177.1 PF07690
Input file 3
XP_002372137.1 PF00754 PF03446 XP_002372172.1
XP_002372137.1 PF00754 PF14833 XP_002372172.1
XP_002372137.1 PF09118 PF03446 XP_002372172.1
XP_002372137.1 PF09118 PF14833 XP_002372172.1
XP_002372140.1 PF00202 PF05378 XP_002372174.1
XP_002372140.1 PF00202 PF01968 XP_002372174.1
XP_002372140.1 PF00202 PF02538 XP_002372174.1
XP_002372145.1 PF03747 PF07690 XP_002372177.1
The output should give the protein ids when domains in one protein interacted with all of the corresponding domains of the other protein
XP_002372137.1 XP_002372172.1
XP_002372137.1 XP_002372172.1
XP_002372137.1 XP_002372172.1
XP_002372137.1 XP_002372172.1
XP_002372140.1 XP_002372174.1
XP_002372140.1 XP_002372174.1
XP_002372140.1 XP_002372174.1
XP_002372145.1 XP_002372177.1
This is a pure programming question. Please search online or better, switch to Python (pandas)/R - this operation is much easier on those tools.