'incomplete' intersect among two files
1
0
Entering edit mode
8 months ago
Lila M ★ 1.3k

Hello everyone!

I'm trying to understand what I am doing wrong and how to do it correct :P

I've downloaded from here the human annotation (gene and gene prediction), that looks like this after selecting desirable columns (file ref)

head ref 
chr1    11873   14408   HGNC:37102      DDX11L1 ENSG00000223972
chr1    14361   29369   HGNC:38034      WASH7P  ENSG00000227232
chr1    17368   17435   HGNC:50039      MIR6859-1       ENSG00000278267
chr1    29773   35417   HGNC:52482      MIR1302-2HG     ENSG00000243485

I do have an additional bed (file problem) that looks like this:

chr1    2561658 2561779 -   1
chr1    2562271 2562325 -   1
chr1    2562542 2562646 -   1
chr1    2563148 2566097 -   1
chr1    2566113 2566344 -   1
chr1    2569378 2569767 -   1

My aim is to annotate file problem with the genes in file ref

I've tried this approach:

bedtools intersect -a ref  -b problem  -wb > results

But the output is not the desirable:

chr1    2561658 2561779 HGNC:11912  TNFRSF14    ENSG00000157873
chr1    2562271 2562325 HGNC:11912  TNFRSF14    ENSG00000157873
chr1    2562542 2562646 HGNC:11912  TNFRSF14    ENSG00000157873
chr1    2563148 2563828 HGNC:11912  TNFRSF14    ENSG00000157873
chr1    2689146 2689496 HGNC:34297  TTC34   ENSG00000215912
chr1    2747459 2747510 HGNC:34297  TTC34   ENSG00000215912

If you look into the coordinates, they are not the 'original coordinates' in file problem. My objective is to keep the original coordinates in file problem and 'add' the gene information if just a small portion overlap.... I'm not sure if this output is possible to get?

Thanks!

bedtools bed intersect • 570 views
ADD COMMENT
0
Entering edit mode
8 months ago
Lila M ★ 1.3k

I think I managed to do so using this script:

bedtools intersect -a ref  -b problem  -wb > result.bed 

(I will leave the question in case someone struggle like me ....

ADD COMMENT
1
Entering edit mode

The command you pasted here is identical to the command you said has non-desirable output. Did you change your mind, or paste the wrong one?

ADD REPLY
0
Entering edit mode

Yes .. it is because I didn't realised about the last columns in which the problem coordinates (original) are kept.

ADD REPLY

Login before adding your answer.

Traffic: 2732 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6