Hello Biostars!
I would like to extract locations (starting point and ending point) for some regions characterizing genes.
Using Biomart, for example, I can extract for a gene:
Gene_Start_(bp)
Gene_End_(bp)
Transcription_Start_Site_(TSS)
Exon_Chr_Start_(bp)
Exon_Chr_End_(bp)
5'_UTR_Start
5'_UTR_End
3'_UTR_Start
3'_UTR_End
In Biomart, by convention, locations grow from left to right.
For a gene on the positive strand, it is quite trivial to find the first exon, and to compute the gene body (from the end of the first exon to the beginning of the rightmost 3'_UTR_Start
).
But how to deal with genes on the negative strand?
I know that in nature, genes on the negative strand are transcribed from right to left.What should I consider as the first exon for this gene on the negative strand?
cheers
True. My experience out of the genome browser is, it returns the actual start of the exon, which I think is the right 5'. Quick question: What do you mean by "In "nature" all genes are transcribed in the same direction (5' to 3') relative to the template strand"?
I just quoted the word the original poster used "in nature"- trying to emphasize that when the transcription takes place there is no left and right. That only comes from what we chose as coordinate system.
(But then calling it just 5' and 3' in can turn also be confusing as now one needs to state 5' of what? The RNA is produced in 5' -> 3' but the polymerase traverses in 3' to 5')
Thank you - I did not know that RNA polymerase transcribes in a strand insensitive manner. How is the information passed on to ribosomes on the translation start sites?
Wait that is not what I meant to say! The transcribed sequence is always in 5' to 3'. What it does not do is go left to right or right to left, that is all that I meant.
If one were to obtain the sequence from the reverse strand then one would not need to go "backwards". It is only when we have a coordinate system relative to the forward strand that we need to keep track of reversing it.
OK, so polymerase always transcribes 5' to 3' on the relevant template strand (exon 1 to exon-n). Our co-ordinate system is based on one strand, which is where the confusion originates - on adapting indexes based on forward strand.
Many thanks for your answers,
To be more precise this is the situation, given these two genes:
The question is from which direction these genes are transcripted (from the start to the end for +1. From the end to the start for -1)?
PS:the data was retrieved from Biomart;
Cheers!
strand = 1 is transcribed from 'start' to 'end'. strand=-1 is transcribed from 'end' to 'start'