Question

Compare pairs of key/value of perl hash tables in thre input files

0

Entering edit mode

4.5 years ago

aiswaryabioinfo ▴ 30

Does anyone know how to compare pairs of key/value in two hashtables with a third file ? I'm currently working with three tab delimited files. The first two files contains the list of proteins with their pfam domain ids as information and the third file contains the domain-domain interactions. I need to compare all the files and identify the protein pairs if domains in one protein interacted with all of the corresponding domains of the other protein. Input files looks like :

Input file 1

XP_002372137.1    PF00754
XP_002372137.1    PF09118
XP_002372140.1    PF00202
XP_002372145.1    PF03747

Input file 2

XP_002372172.1    PF03446
XP_002372172.1    PF14833
XP_002372174.1    PF05378
XP_002372174.1    PF01968
XP_002372174.1    PF02538
XP_002372177.1    PF07690

Input file 3

XP_002372137.1    PF00754    PF03446    XP_002372172.1
XP_002372137.1    PF00754    PF14833    XP_002372172.1
XP_002372137.1    PF09118    PF03446    XP_002372172.1
XP_002372137.1    PF09118    PF14833    XP_002372172.1
XP_002372140.1    PF00202    PF05378    XP_002372174.1
XP_002372140.1    PF00202    PF01968    XP_002372174.1
XP_002372140.1    PF00202    PF02538    XP_002372174.1
XP_002372145.1    PF03747    PF07690    XP_002372177.1

The output should give the protein ids when domains in one protein interacted with all of the corresponding domains of the other protein

XP_002372137.1    XP_002372172.1
XP_002372137.1    XP_002372172.1
XP_002372137.1    XP_002372172.1
XP_002372137.1    XP_002372172.1
XP_002372140.1    XP_002372174.1
XP_002372140.1    XP_002372174.1
XP_002372140.1    XP_002372174.1
XP_002372145.1    XP_002372177.1

hash table hashes perl unix protein domain • 854 views

ADD COMMENT • link updated 4.5 years ago by JC 13k • written 4.5 years ago by aiswaryabioinfo ▴ 30

0

Entering edit mode

This is a pure programming question. Please search online or better, switch to Python (pandas)/R - this operation is much easier on those tools.

ADD REPLY • link 4.5 years ago by Ram 44k

score 1 · Answer 1 · 2020-06-16

Not sure if this is what you need:

#!/usr/bin/perl

use strict;
use warnings;

$ARGV[2] or die "use interactions.pl FILE1 FILE2 FILE3 > OUT\n";

my $file1 = shift @ARGV;
my $file2 = shift @ARGV;
my $file3 = shift @ARGV;

my %set1  = ();
my %set2  = ();
my %inter = ();

open (my $f1, "<", "$file1") or die "cannot read $file1\n";
while (<$f1>) {
    chomp;
    my ($p, $d) = split (/\s+/, $_);
    $set1{$p}{$d}++;
}
close $f1;

open (my $f2, "<", "$file2") or die "cannot read $file2\n";
while (<$f2>) {
    chomp;
    my ($p, $d) = split (/\s+/, $_);
    $set2{$p}{$d}++;
}
close $f2;

open (my $f3, "<", "$file3") or die "cannot read $file3\n";
while (<$f3>) {
    chomp;
    my ($p1, $d1, $d2, $p2) = split (/\s+/, $_);
    $inter{"$p1=$p2"}{"$d1=$d2"}++;
}
close $f3;

foreach my $pair (keys %inter) {
    my ($p1, $p2) = split (/=/, $pair);
    my @d1 = keys %{ $set1{$p1} }; # total domains in p1
    my @d2 = keys %{ $set2{$p2} }; # total domains in p2
    my $expect = 0; # total expected interactions
    my $total = 0; # total interactions reported
    foreach my $d1 (@d1) {
        foreach my $d2 (@d2) {
            $expect++;
            $total++ if (defined $inter{$pair}{"$d1=$d2"});
        }
    }
    print "$p1\t$p2\n" if ($expect == $total); # print if all interactions was detected
}

testing it:

$ perl interactions.pl file1.txt file2.txt file3.txt
XP_002372137.1  XP_002372172.1
XP_002372140.1  XP_002372174.1
XP_002372145.1  XP_002372177.1