Hi,
I have two txt files are following below. I expect output result only similar ids line in both. I tried some perl scripts and grep commands as follows. Hence I did not get my desired output.
$ grep -w -f file1.txt file2.txt >out.txt
$ grep -wFf file1.txt file2.txt >out.txt
perl script:
use strict;
use warnings;
use autodie;
my $f1 = shift || "file1.txt";
my $f2 = shift || "file2.txt";
my %results;
open my $file1, '<', $f1;
while (my $line = <$file1>)
{
$results{$line
} = 1 }
open my $file2, '<', $f2;
while (my $line = <$file2>)
{
$results{$line}++
}
foreach my $line (sort { $results{$b} <=> $results{$a} } keys %results)
{
print "$results{$line} Match found: ", $line if $results{$line} > 1;
}
file 1:
AT1G01020.2 89247399:89248747
AT1G01050.1 89271467:89272751
AT1G01060.1 89274076:89277002
AT1G01070.1 89278983:89280958
AT1G01073.1 34927896:34928000
AT1G01090.1 89287790:89289247
AT1G01100.1 89290369:89290713
AT1G01100.3 81592809:81592958
AT1G01130.1 89302125:89303893
...........
file 2
AT1G01010.1 89243839-89245706
AT1G01020.1 89246997-89247311
AT1G01020.1 89248315-89248745
AT1G01030.1 89251946-89253019
AT1G01040.1 89263598-89270896
AT1G01050.1 89271464-89272749
AT1G01060.1 89274074-89276072
AT1G01060.1 89276890-89277000
AT1G01070.1 89278980-89280956
AT1G01090.1 89287787-89289245
AT1G01100.1 89290366-89290710
...........
What exactly do you need as output? A list of IDs that occur in both files or similar IDs (matching without the number after the '.')? Do you only need the IDs or do you need the second column too?
I updated my script that may improve my aspect but it comparing two files based on full each lines in two files.
However, I need if either first column match in two files, rest of columns do not care, then should print results with some information like- total number of repeat match in both files.
Thanks
The following command is working in good manner, but I am not able to results in some output information such as total repeat match of particular ids in both files.
IOnce you have a file with the particular IDs you want, try the following command to get a count of each ID: