Dear all,
I want to merge two data files. The first contains three columns($a
, $c
, $d
), while the second contains two columns ($b
, $c
). I need to generate a file having four columns ($a
, $b
, $c
, $d
). I have problem with my perl script shown below. Could anyone help? Or there should be easier way. Thank you very much!
data1.txt
mir-a gene1 33
mir-a gene2 34
mir-a gene3 89
mir-b gene1 09
mir-b gene3 33
mir-c gene1 86
mir-c gene2 20
data2.txt
group1 gene1
group1 gene3
group2 gene1
group3 gene1
group3 gene2
merged result should be:
mir-a group1 gene1 33
mir-a group2 gene1 33
mir-a group3 gene1 33
mir-a group3 gene2 34
mir-a group1 gene3 89
mir-b group1 gene1 09
mir-b group2 gene1 09
mir-b group3 gene1 09
mir-b group1 gene3 33
mir-c group1 gene1 86
mir-c group2 gene1 86
mir-c group3 gene1 86
mir-c group3 gene2 20
my problematic perl script
#!/usr/bin/perl -w
use strict;
my $file1 = shift;
my $file2 = shift;
my (%hash1, %hash2, %hash_merge);
#read dat1.txt
open F1, $file1;
while(my $line1 = <F1>){
chomp $line1;
my ($a, $c, $d) = split/\s+/,$line1;
$hash1{$a}{$c}{$d} = 1;
}
close F1;
#read dat2.txt
open F2, $file2;
while(my $line2 = <F2>){
chomp $line2;
my ($b, $c) = split/\s+/,$line2;
$hash2{$b}{$c} = 1;
}
close F2;
#merge hash1 and hash2
foreach my $a (keys %hash1){
foreach my $c (keys %{$hash1{$a}}{
foreach my $b (keys %hash2){ #find column $b in dat2.txt
next unless exists $hash2{$b}{$c};
foreach my $d (keys %{$hash1{$a}{$c}}){
$hash_merge{$a}{$b}{$c}{$d} = 1;
}
}
}
}
#print out hash_merge
foreach my $a (keys %hash_merge){
foreach my $b (keys %{$hash_merge{$a}}{
foreach my $c (keys %{$hash_merge{$a}{$b}}{
foreach my $d (keys %{$hash_merge{$a}{$b}{$c}}{
print "$a\t$b\t$c\t$d\n";
}
}
}
}
Alex and dariober, thanks a lot for your inputs. Really helpful!