I have installed bedtool and tried fastafromBED but it looks like when I ask for positions 1 to 25, it gives me 2 to 25 instead in the output. How come?
I had posted that as a comment and got a first reply:
"BED format uses zero-based, half-open coordinates, so the first 25 bases of a sequence are in the range 0-25 (those bases being numbered 0 to 24). – Keith James♦ Mar 12 at 16:33"
So BED coordinates are different from GFF3 for example? How to confidently reformat columns of start-stop intervals before extracting coordinates using BEDtools?
chromStart - The starting position of the feature in the chromosome or scaffold. The first base in a chromosome is numbered 0.
chromEnd - The ending position of the feature in the chromosome or scaffold. The chromEnd base is not included in the
display of the feature. For example, the first 100 bases of a chromosome are defined as chromStart=0, chromEnd=100, and span the bases numbered 0-99.
As for
So BED coordinates are different from GFF3 for example? How to confidently reformat columns of start-stop intervals before extracting coordinates using BEDtools?
I have found it useful to think of bed coordinates as marking the spaces between the the bases, rather than the bases themselves. I will try to represent this:
[?][?]
| A | C | G | T | A | C | G | T |[?]
0 | 1 | 2 | 3 | [?]4 | 5 |[?] 6 | 7 [?]| 8
So if you wanted to describe the first base, it would be:
chr[?][?][?][?]0[?][?][?][?]1
and GTAC:
chr[?][?][?][?]2[?][?][?][?]6
Another handy thing to note, you should always be able to subtract the start from the end to get the length of the bases you are describing, except in the case of insertions, which is the only case when you should have a start == stop. This should make sense in this scheme, since you are really only calling out a position between two bases, where a bit of sequence has been inserted.
ADD COMMENT
• link
updated 3.3 years ago by
Ram
44k
•
written 13.3 years ago by
Rlong
▴
340
You may want to see this related question on the pros/cons of different coordinate systems: What Are The Advantages/Disadvantages Of One-Based Vs. Zero-Based Genome Coordinate Systems
You may want to see this related question on the pros/cons of different coordinate systems.