Hi, I have two tab delimited files. I am looking to compare these two files and need to grep one upstream & and one downstream genes from file 2 based on the locations in column 2 and 3 in file 1. Please suggest any script.
File 1:
Node1 3223 3245
Node2 4556 4599
File 2:
Node3 3199 3210 ID=r4567
Node4 3260 3290 ID=r4587
Node5 3310 3340 ID=r6754
Node6 4410 4490 ID=r0987
Node7 4491 4550 ID=r2345
Node8 4610 4660 ID=6789
Node9 4670 4710 ID=3245
OUTPUT:
Node1 3199 3210 ID=r4567 Node4 3260 3290 ID=r4587
Node2 4491 4550 ID=r2345 Node8 4610 4660 ID=6789
If you want the closest feature up- or downstream, bedtools closest could be useful.
Hi, I am using bedtools, however it is showing error- I have used sort command for sorting files.
ERROR: Sort order was unspecified, and file xac.bed is not sorted lexicographically. Please rerun with the -g option for a genome file. See documentation for details.
Did you sort both files using the above command? Are both files in
bed
format (chr, start, stop, name, score)? Could you post the first few lines of both files?Yes, I sorted both files, Please see some lines of my both files.
file1.bed
NODE_103_length_15107_cov_6.54451 3214 3273
NODE_104_length_1587_cov_15.5242 1455 1420
NODE_10_length_119210_cov_21.822 49628 49660
NODE_10_length_119210_cov_21.822 49631 49663
file2.bed
NODE_1000_length_1519_cov_1.91598 197 568
NODE_1000_length_1519_cov_1.91598 525 620
NODE_1000_length_1519_cov_1.91598 738 1169
NODE_1000_length_1519_cov_1.91598 1227 1502
NODE_1001_length_1515_cov_1.35616 306 728
NODE_1001_length_1515_cov_1.35616 967 1476
NODE_1002_length_1512_cov_1.62938 33 650
NODE_1002_length_1512_cov_1.62938 671 1036
Even though the file names end in
.bed
, those aren't in.bed
format. You need to reformat them:https://genome.ucsc.edu/FAQ/FAQformat.html#format1
I have prepared my files in .bed format and sorted by following, still I am not able to get answers, Please see my error below:
File 1:
chr start stop name
NODE_55_length_30858_cov_27.421 19901 19951 bg1
NODE_55_length_30858_cov_27.421 19900 19930 bg2
NODE_55_length_30858_cov_27.421 17578 17613 bg3
File 2:
chr start stop name
NODE_1_length_424572_cov_13.6788 99 174 ID=LGE0059
NODE_1_length_424572_cov_13.6788 179 254 ID=LGE0059
ERROR: Sort order was unspecified, and file sorted-gff.bed is not sorted lexicographically. Please rerun with the -g option for a genome file. See documentation for details.
I'm not sure I follow what you're asking, it looks like you might have had issues formatting the code in your question.
Are you interested in printing nodes where File 2 has a greater value than File 1? Are you interested in printing File 2 for only nodes where File 1 passes a criteria? Are the same number of nodes in each file?
Hi, I improved my query now, please consider now.