So these coordinates, depending on orientation, always point to start codons of the gene or to the termination codon. Given this information, is it fair to assume that the region between the end of one gene and the start of the following gene can be used to infer intergenic regions? I am particularly interested in distinguishing 5' UTRs for these genes based on the coordinates. Computational gene extraction itself is not the problem, I can't think about any effective way where I can tell apart 3UTRs from 5UTRs in the region separating two tandem genes since some intergenic regions have varying sizes and some genes are included together in operons.
If you have any ideas that can help me brainstorm this problem I will be grateful
The information about the UTRs of E. coli is indeed not in this table or any other simple annotation table. These can be found experimentally, you can download the 5' UTRs from RegulonDB for instance, there are 3 files with transcription start site locations ("Transcription start sites experimentally determined in the laboratory of Dr. Morett"), beware that there might be different values for each gene. The 3' UTRs can be obtained from RNA-seq experiments as well, I didn't find a simple table that describe them. You can try and build the transcripts yourself using published RNA-seq experiments, you can use Rockhopper for this purpose, it's really friendly and gives you a simple table with transcription start and termination, translation start and termination for each gene.
You should be aware that there are alternative transcription start sites and termination sites so the UTRs can be different for different mRNA molecules.
Another issue is the direction of the genes, two genes can share the same terminator (the poly-U part) if they are convergent (---><---)or the same promoter if they are divergent (<----->), if they are in the same orientation they might reside on the same transcription unit.
I know it's a mess.
Good luck.
ADD COMMENT
• link
updated 3.0 years ago by
Ram
44k
•
written 10.3 years ago by
Asaf
10k
0
Entering edit mode
It sure is a mess. An interesting type of mess. Thanks for the clarifications regarding that possibility that convergent and divergent genes could be sharing either the same terminator or promotor. That was insightful
In the UTR table, I noticed some genes, like CsrA have several 5' UTRs. That all UTRs have the same start position but different end positions and therefore different lengths. From an experiment perspective what do you think has happened here? Does that mean 5` UTRs were not possible to resolve for such genes or that such genes simply tend to have more than one UTR?
I think it's real, these are alternative TSSs. From looking at the gene in a RNA-seq I'm working on right now you can see that there appears to be different start sites, pay attention to the coordinates that match two of the TSSs in the first table on RegulonDB.
ADD REPLY
• link
updated 3.0 years ago by
Ram
44k
•
written 10.3 years ago by
Asaf
10k
It sure is a mess. An interesting type of mess. Thanks for the clarifications regarding that possibility that convergent and divergent genes could be sharing either the same terminator or promotor. That was insightful
In the UTR table, I noticed some genes, like CsrA have several 5' UTRs. That all UTRs have the same start position but different end positions and therefore different lengths. From an experiment perspective what do you think has happened here? Does that mean 5` UTRs were not possible to resolve for such genes or that such genes simply tend to have more than one UTR?
I think it's real, these are alternative TSSs. From looking at the gene in a RNA-seq I'm working on right now you can see that there appears to be different start sites, pay attention to the coordinates that match two of the TSSs in the first table on RegulonDB.