How do I replace the values in one file with the values in another file?
3
0
Entering edit mode
6.1 years ago
giegie • 0

I have 2 tab-separated files which look like this:

file1.txt

chr1    710000  715000  143
chr1    715000  720000  144
chr1    720000  725000  145
chr1    725000  730000  146
chr1    730000  735000  147
chr1    735000  740000  148
chr1    740000  745000  149
chr1    745000  750000  150
chr1    750000  755000  151
chr1    755000  760000  152
chr1    760000  765000  153

file2.txt

143 143 84
143 144 26
143 152 32
143 153 15
144 152 11

The expected output:

output.txt

chr1    710000  715000  chr1    710000  715000  84
chr1    710000  715000  chr1    715000  720000  26
chr1    710000  715000  chr1    755000  760000  32
chr1    710000  715000  chr1    760000  765000  15
chr1    715000  720000  chr1    755000  760000  11

I would like to match the unique numbers in file1.txt (column 4) with the numbers in file2.txt (column 1 and 2) and replace them with values from file1.txt (column 1-3). The output.txt should have 7 columns, where the last one have the corresponding values from the file2.txt (column 3).

hic • 1.9k views
ADD COMMENT
0
Entering edit mode

How is this related to bioinformatics, please?

ADD REPLY
0
Entering edit mode

The files are outputs of HiC-Pro pipeline that generates intra- and inter-chromosomal contact maps.

ADD REPLY
0
Entering edit mode

Thank you

ADD REPLY
1
Entering edit mode
6.1 years ago
Nitin Narwade ★ 1.6k

Here is a simple python code, that you can use.

By the way where you are going to use this file. Is it an input for some specific tool or server?

###
##
##  USAGE: python script.py input1.tsv input2.tsv output.tsv
##
###

import sys

try:
    file1 = sys.argv[1]
    file2 = sys.argv[2]
    outputFileName = sys.argv[3]
except:
    print("ERROR: Missing commandline arguments.\n\n USAGE: python " + sys.argv[0] + " input1.tsv input2.tsv output.tsv")
    exit(0)

try:
    fr = open(file1, "r")
except:
    print("ERROR: Can not open " + file1)
    exit(0)

file1Dict = {}

for line in fr:
    line = line.strip()
    tempList = line.split("\t")
    file1Dict[tempList[3]] = tempList[0:2]

fr.close()

try:
    fr = open(file2, "r")
except:
    print("ERROR: Can not open " + file2)
    exit(0) 
try:
    fw = open( outputFileName, "w")
except:
    print("ERROR: Can not create " +  outputFileName)
    exit(0)

for line in fr:
    line = line.strip()
    tempList = line.split("\t")
    if(tempList[0] in file1Dict and tempList[1] in file1Dict):
        fw.write('\t'.join(file1Dict[tempList[0]]) + "\t" + '\t'.join(file1Dict[tempList[1]]) + "\t" + tempList[2] + "\n")

fr.close()
fw.close()

print("[INFO] Output written to " + outputFileName)
ADD COMMENT
0
Entering edit mode

Not sure why, but the code creates an empty output :( Thank you for your effort, I would be really glad if you could try to correct it. The files are the outputs of the HiC-Pro pipeline, I am using these files for downstream analysis of HiChIP data.

ADD REPLY
0
Entering edit mode

Dear giegie, I have updated the above code, please try with this.

You have to run it using (Assuming you have saved this code with name script.py) command like python script.py input1.tsv input2.tsv output.tsv

output.tsv file will generate with contents given below.

chr1    710000  715000  chr1    710000  715000  84
chr1    710000  715000  chr1    715000  720000  26
chr1    710000  715000  chr1    755000  760000  32
chr1    710000  715000  chr1    760000  765000  15
chr1    715000  720000  chr1    755000  760000  11

I have tried with the example data it is working fine for me.

One more thing needs to clarify here, is the example data (One you have posted in the question) real HiC-Pro output or have you just created it by your own for sec of example.

could you please post some real data so that I can test the code at my side (In case if you have created the sample data by your own).

Thank you.

ADD REPLY
1
Entering edit mode
6.1 years ago
JC 13k

Some Perl:

#!/usr/bin/perl
use strict;
use warnings;

my $file1 = "file1.txt";
my $file2 = "file2.txt";
my $outfile = "output.txt";
my %tags = ();

open (my $f1, "<", $file1) or die "cannot read $file1\n";
while (<$f1>) {
    chomp;
    my ($chr, $ini, $end, $tag) = split (/\s+/, $_);
    $tags{$tag} = "$chr\t$ini\t$end";
}
close $f1;

open (my $f2, "<", $file2) or die "cannot read $file2\n";
open (my $out, ">", $outfile) or die "cannot write $outfile\n";
while (<$f2>) {
    chomp;
    my ($tag1, $tag2, $val) = split (/\s+/, $_);
    next unless (defined $tags{$tag1} and defined $tags{$tag2});
    print $out join "\t", $tags{$tag1}, $tags{$tag2}, $val;
    print $out "\n";
}
close $f2;
close $out;
ADD COMMENT
1
Entering edit mode
6.1 years ago

The strategy is straightforward:

  1. reading file1 and save them in map/hash/dict with 4th column as keys,
  2. and then read line in file2 one by one, replacing 1th and 2nd column with previous readed values from file1.

Here's a simple solution using an unreleased version of csvtk, just for fun.

# re-arrange columns
$ csvtk cut -H -t -f 4,1-3 file1.txt > file1.re.txt

$ head -n 3 file1.re.txt
143     chr1    710000  715000
144     chr1    715000  720000
145     chr1    720000  725000

# replace value in column 1 and 2 with corresponding value provided by file1.re.txt
$ csvtk replace -H -t -k file1.re.txt -f 1,2 -p '(.+)' -r '{kv}' file2.txt -A
[INFO] read key-value file: file1.re.txt
[INFO] 11 pairs of key-value loaded
"chr1   710000  715000" "chr1   710000  715000" 84
"chr1   710000  715000" "chr1   715000  720000" 26
"chr1   710000  715000" "chr1   755000  760000" 32
"chr1   710000  715000" "chr1   760000  765000" 15
"chr1   715000  720000" "chr1   755000  760000" 11

# well, we need to remove the double quotes
$ csvtk replace -H -t -k file1.re.txt -f 1,2 -p '(.+)' -r '{kv}' file2.txt -A | sed 's/"//g'
[INFO] read key-value file: file1.re.txt
[INFO] 11 pairs of key-value loaded
chr1    710000  715000  chr1    710000  715000  84
chr1    710000  715000  chr1    715000  720000  26
chr1    710000  715000  chr1    755000  760000  32
chr1    710000  715000  chr1    760000  765000  15
chr1    715000  720000  chr1    755000  760000  11
ADD COMMENT

Login before adding your answer.

Traffic: 1625 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6