Question

How to compare two files on the basis of Two IDs

0

Entering edit mode

8.2 years ago

Genomebiology • 0

Hi, I made a perl script to compare the files on the basis of two Ids. But could not get the success. If anyone can help in this ??

File 1:

chr7 151046672
chr7 151047369
chr3 127680920
chr3 127680920

file2 :

chr1 66953622 66953654
chr1 67200451 67200472
chr1 67200475 67200478
chr1 67058869 67058880
chr1 67058881 67058885
chr7 151046672 127680920
chr7 151047369 127680920
chr3 127680920 151046672
chr3 127680920 151047369

#!/usr/bin/perl -w

$pwd = `pwd`;
chomp($pwd);

$file=$ARGV[0];
$file1=$ARGV[1];

open(IN,$file);
while ($line=<IN>){
chomp($line);

@ary = split(/\t/,$line);
chomp($ary[0]);chomp($ary[1]);

open(SK,$file1);
while($line1=<SK>)
{
chomp($line1);
    @any = split(/\t/,$line1);
    chomp($any[0]); chomp($any[0]);chomp($any[1]);chomp($any[2]);
if (($ary[0] eq $any[0] and $ary[1] == $any[1]) or ($ary[0] eq $any[0] and $ary[1] == $any[2]))
{
    print "$line\tE\n";

}
else
{ print "$line\tM\n";}
}
}

This code is giving multiple lines with 'M' results only. Then I tried another code ..

#!/usr/bin/perl
use warnings; 
use strict;
use Data::Dumper;

my $file1 = $ARGV[0];
open($infile1,$file1);
my $file2 = $ARGV[1];
open($infile2,$file2);

my %file2_hash;

while (my $line = <$infile1>)
{
   chomp $line;  #so that output with E or M can be on same line
   next if $line =~ /^\s*$/;   #skip blank lines (a common infile goof
+)

   my ($chr, $val1, $val2) = split /\s+/,$line;
}
close $infile1;

while (my $line = <$infile2>)
{
chomp $line;   
 next if $line =~ /^\s*$/;   #skip blank lines (a common infile goof)

   my ($key, $value1, $value2) = split /\s+/, $line; # use better "nam
+es" I have
                                           # no idea of what a chr col
   $file2_hash{"$key:$value1:$value2"} = 1;

close $infile2;

   if (exists $file2_hash{"$chr:$val1:$val2"})
   {
      print "$line\tE\n";  # match exists with file 1   
}
   else
   {
   print "$line\tM\n";  # match does NOT exist with file 1

}

}

But again the same error..

What will be the possable solution ??

perl • 2.0k views

ADD COMMENT • link updated 7.2 years ago by mittu1602 ▴ 200 • written 8.2 years ago by Genomebiology • 0

1

Entering edit mode

What are you trying to achieve exactly? If it is compare two lists of positions to see what they have in common you have the R library GenomicRanges that has a lot of nice functions to do that:

findOverlaps(file1, file2) countOverlaps(file1, file2) etc

ADD REPLY • link 8.2 years ago by VHahaut ★ 1.2k

score 1 · Answer 1 · 2017-09-25

One way to do this without reinventing the wheel:

Install BEDOPS.

Fix your files file1.txt and file2.unsorted.bed:

$ awk '{ print $1, $2, ($2 + 1); }' file1.txt | sort-bed - > file1.bed
$ sort-bed file2.unsorted.bed > file2.bed

Then run set operations:

$ bedops -e 1 file2.bed file1.bed > elements_in_file2_that_overlap_file1.bed
$ bedops -n 1 file2.bed file1.bed > elements_in_file2_that_do_not_overlap_file1.bed

And conversely:

$ bedops -e 1 file1.bed file2.bed > elements_in_file1_that_overlap_file2.bed
$ bedops -n 1 file1.bed file2.bed > elements_in_file1_that_do_not_overlap_file2.bed

Etc.

score 0 · Answer 2 · 2017-09-25

0

Entering edit mode

7.2 years ago

mittu1602 ▴ 200

You can also use $ bedtools intersect -wao -a file1.bed -b file2.bed -o Output.bed

ADD COMMENT • link 7.2 years ago by mittu1602 ▴ 200