Entering edit mode
6.1 years ago
i.jabre26
▴
10
hello,
I have two files :
File1 :
chr5 20311169 20311244 5 20311177 20311251 K00230:40:HNWJLBBXX:4:1101:1002:35936 255 + - 6610258.00
chr5 26610220 26610295 5 26610221 26610296 K00230:40:HNWJLBBXX:4:1101:1022:24155 255 + - 220311210.00
File 2:
chr5 20311200 20311220 Nucleosome:1 110 5.0 39.9 MainPeak 1.43492858 0.68583064
chr5 801 861 Nucleosome:2 70 1.0 5.4 MainPeak 0.17076187 0.806538035
chr5 1021 1091 Nucleosome:3 80 2.0 14.4 MainPeak 0.42430331 0.481579895
chr5 1181 1251 Nucleosome:4 80 1.0 7.5 MainPeak 0.1362587 0.32626102999999995
I'm interested in printing rows from file 1 using a python code if the values of 11th column falls within the range start and end (2nd and 3rd columns )declared in the seconds file. As the position is only unique within a certain chromosome (chr) first it has to be tested if the chr's are identical... hence my desired output is
chr5 20311169 20311244 5 20311177 20311251 K00230:40:HNWJLBBXX:4:1101:1002:35936 255 + - 20311210.00
I have tried awk codes.. it works perfectly fine but they are very very slow !
The files I'm testing ( from which I need to print the rows are around 4 GB ).
I would highly appreciate if I can have some python or perl code
Thanks !
reformat File1 to generate a bed with
awk
and use bedtools intersect ...Show them please.
You are interested in:
you tried:
?
values in 11th column of file 1 are 6610258.00 and 220311210.00. start and end coordinates in second file do not overlap at all with 11 column of file 1. Moreover, it is interesting to see that the coordinates in 11th column in file 1 are floats. Output last column value "20311210.00" doesn't appear in both the input files except in output.
All in all, this seems to be xy problem to me.
here is the python code for a logical problem:
Print all lines from file 2, when last (11th) column of file 1 is between start (2nd) and stop (3rd) coordinates in a second file:
file1:
file 2:
code:
output:
Thank you for sharing the code. It is helpful
This is the awk code I used.. It is working perfectly fine but it very slow !!!
Please use the formatting bar (especially the
code
option) to present your post better. I've done it for you this time.Thank you for the information
Python and perl will not be faster than
awk
...