Hello everyone!
I'm trying to understand what I am doing wrong and how to do it correct :P
I've downloaded from here the human annotation (gene and gene prediction), that looks like this after selecting desirable columns (file ref)
head ref
chr1 11873 14408 HGNC:37102 DDX11L1 ENSG00000223972
chr1 14361 29369 HGNC:38034 WASH7P ENSG00000227232
chr1 17368 17435 HGNC:50039 MIR6859-1 ENSG00000278267
chr1 29773 35417 HGNC:52482 MIR1302-2HG ENSG00000243485
I do have an additional bed (file problem) that looks like this:
chr1 2561658 2561779 - 1
chr1 2562271 2562325 - 1
chr1 2562542 2562646 - 1
chr1 2563148 2566097 - 1
chr1 2566113 2566344 - 1
chr1 2569378 2569767 - 1
My aim is to annotate file problem with the genes in file ref
I've tried this approach:
bedtools intersect -a ref -b problem -wb > results
But the output is not the desirable:
chr1 2561658 2561779 HGNC:11912 TNFRSF14 ENSG00000157873
chr1 2562271 2562325 HGNC:11912 TNFRSF14 ENSG00000157873
chr1 2562542 2562646 HGNC:11912 TNFRSF14 ENSG00000157873
chr1 2563148 2563828 HGNC:11912 TNFRSF14 ENSG00000157873
chr1 2689146 2689496 HGNC:34297 TTC34 ENSG00000215912
chr1 2747459 2747510 HGNC:34297 TTC34 ENSG00000215912
If you look into the coordinates, they are not the 'original coordinates' in file problem. My objective is to keep the original coordinates in file problem and 'add' the gene information if just a small portion overlap.... I'm not sure if this output is possible to get?
Thanks!
The command you pasted here is identical to the command you said has non-desirable output. Did you change your mind, or paste the wrong one?
Yes .. it is because I didn't realised about the last columns in which the problem coordinates (original) are kept.