I was looking at a simple visualization program that, among other things, visualized BED file with respect to the inputted FASTA ref sequence. I noticed that there were separate tracks for annotations and that there very many regions that were overlapping. I'm having a hard time understanding that, can someone explain to me how can coding regions overlap in a BED file?
edit: Since I see that I created somewhat of a confusion with my not-so-clear question, basically I'm asking if someone can explain to me how is it possible that coding regions overlap in a BED file.
For example, these are couple of lines from my Escherichia_coli_id52271.bed
file:
1 5262 5712 E4287_5.mRNA1 449 + . . . 1 450 0 mRNA
1 5262 5712 E4287_5 449 + . . . 1 450 0 gen
...
1 6729 7644 E4287_7 914 + . . . 1 915 0 gene
1 6729 7644 b0030 914 + . . . 1 915 0 CDS
There are many (unrelated) questions here without context (what is "separate tracks for annotation" ? "how can coding regions overlap in a BED file" ? ).
Okay yeah, maybe I wasn't focused enough in my question, my main question is how can coding regions overlap in a BED file
multiple transcripts, interleaved genes: the question is why wouldn't they overlap ? please, show us and example.
I've edited my question with couple of examples. In the examples provided, the regions don't just overlap, they are exactly the same and I'm having a hard time understanding that.
what's the definition of this BED file ? I see a coordinate for a mRNA coded by a gene. As it' in a bacteria, of course, it 's the very same coordinate.
Your file looks like a modified GTF file.
What do you mean by definition?
what's the meaning of the columns 'mRNA' and 'gen' ?
I'm not sure, I downloaded the bed file from here. I also found it weird that it had this extra column but I didn't know what to make of it.
Maybe answer doesn't have to be specific to this file, what is a simple explanation why would two lines in BED file have the same coordinates, or why would they overlap ?
again , if one line is for mRNA, and the other is for gene, it's normal.
What does that mean exactly? Sorry for bothering you this much
They are standard feature types (describing gene and its products) that are defined in GTF/GFF files from which your BED file is likely derived from: https://www.ncbi.nlm.nih.gov/genbank/genomes_gff/#attributes and http://gmod.org/wiki/GFF2#Using_the_Group_field_for_simple_features
Gene: https://en.wikipedia.org/wiki/Gene#Structure_and_function
Okay, thank you very much, but besides this being a BED file derived from GTF/GFF, what else could be a reason for overlapping features ?
In a word, Biology.
mRNA
is a product produced from agene
. In case of bacteria sizes for these two may be equivalent since there is no splicing (exon/introns).But also a gene can have multiple transcriptions, correct ?