Let's say I have a transcript NM_12345 whose first exon starts at position 1000 on chromosome 15 in hg19.
What if in hg38 they had discovered the following: "oh, the ten repeats of length 5 at the beginning of chromosome 15 are actually twenty repeats (in most people). We should insert those nucleotides at the beginning of chromosome 15."
This means that my first exon doesn't start at position 1000 anymore but instead at position 1000 + 5*10 = 1050.
This means that a bed file which was created based on hg19 should not be used for hg38 based work, right?
Thus bed files should actually have a header line which makes it clear which reference genome the features refer to ... why isn't there such a line?
You can add comments (using
#
at the beginning of the line) to a bed file:So you can add metadata to a bed file. It will be ignored by bed parsers, but may be useful for humans dealing with the files.