Entering edit mode
24 months ago
baijiangshan9726
•
0
Hi, I have a file1:
60000 498177
65000 498178
70000 498179
75000 498180
80000 498181
85000 498182
90000 498183
95000 498184
100000 498185
105000 498186
110000 498187
115000 498188
and a file2:
60000 70000 1
70000 70000 20
70000 85000 1
80000 85000 1
110000 110000 23
115000 115000 3
115000 120000 1
120000 120000 2
80000 125000 1
I want to use the information in file1 to substitute some information in file2. If the 1st and 2nd column of file2 is the same as the 1st column in file1, then use 2nd column to substitute 1st,2nd column in file2. The final result should look like this:
498177 498179 1
498179 498179 20
498179 498182 1
498181 498182 1
498187 498187 23
498188 498188 3
498188 498189 1
498189 498189 2
498181 498190 1
498190 498190 2
I wrote a python script, but the speed to process is very slow(or to say the file2 is very big, it has 8 million rows). How should I do this much quicker? Thanks!
how is it related to bioinformatics ? You want join by the way.
hi, It's actually a hic file. I am analysis hic data.
I still don't see how that is related to bioinformatics.
If your python script is slow, you can divide the original file into 10 pieces (or 20, if you have that many CPU threads) and run the substitution on each one of them in parallel. When it is done, you concatenate them back. You would use
split
andcat
commands.hi, thanks so much.
join
is very good to use.