bedtools annotate
1
0
Entering edit mode
3.1 years ago
alexmondaini ▴ 20

Sorry for the trivial question but I can't understand the numbers in the fraction of each feature in bedtools annotate. Example from website:

$ cat variants.bed
chr1 100  200   nasty 1  -
chr2 500  1000  ugly  2  +
chr3 1000 5000  big   3  -

$ cat genes.bed
chr1 150  200   geneA 1  +
chr1 175  250   geneB 2  +
chr3 0    10000 geneC 3  -

$ cat conserve.bed
chr1 0    10000 cons1 1  +
chr2 700  10000 cons2 2  -
chr3 4000 10000 cons3 3  +

$ cat known_var.bed
chr1 0    120   known1   -
chr1 150  160   known2   -
chr2 0    10000 known3   +

$ bedtools annotate -i variants.bed -files genes.bed conserve.bed known_var.bed
chr1  100     200     nasty   1       -       0.500000        1.000000        0.300000
chr2  500     1000    ugly    2       +       0.000000        0.600000        1.000000
chr3  1000    5000    big     3       -       1.000000        0.250000        0.000000

Why the result on the first third column is 0.3 ?

chr1  100     200     nasty   1       -       0.500000        1.000000        0.300000

Is it 30 bases overlap divided by 100 ?

$ cat known_var.bed
chr1 0    120   known1   -
chr1 150  160   known2   -

/

  $ cat variants.bed
chr1 100  200   nasty 1  -

But then the 0.5 should be another value right ?

$ cat genes.bed
chr1 150  200   geneA 1  +
chr1 175  250   geneB 2  +

/

  $ cat variants.bed
    chr1 100  200   nasty 1  -

Should be 75/100 or .75.

bedtools • 1.2k views
ADD COMMENT
0
Entering edit mode
ADD REPLY
0
Entering edit mode
3.1 years ago

It appears you need to compute the fractions only for intervals that are on the same strand.

ADD COMMENT
0
Entering edit mode

no I would just like to understand the output of this example.

ADD REPLY
0
Entering edit mode

It seems that you asked:

Why the result on the first third column is 0.3 ?

but you also answered it yourself.

you have it 20 bp from known1 + 10 bp from known2, 10+30=30 divided by 100 gives you 0.3

what is the question that you don't know the answer for?

ADD REPLY
0
Entering edit mode

I think I got the point now. For the third column known1 + known2 add up to 30 . However the same does not apply for the first column first row result. The comparison would be the following:

chr1 100  200   nasty 1  -

with

chr1 150  200   geneA 1  +
chr1 175  250   geneB 2  +

Here we cannot just add 50 for geneA with 25 for geneB since the region 175-200 is covered both by geneA and geneB. On the other hand:

chr1 0    120   known1   -
chr1 150  160   known2   -

known1 and known2 are not overlapping regions, hence 30.

Right??

ADD REPLY

Login before adding your answer.

Traffic: 2377 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6