Forward And Reverse Strand Conventions
4
70
Entering edit mode
14.1 years ago
Andrea_Bio ★ 2.8k

Hi

I apologise for the really basic question. I've probably said this a million times now but I'm returning to this field after a long time and I keep confusing myself by mis-remembering or half-remembering things from the past and it's not helping me. It would be easier coming to the field from fresh i think. Anyhow...

As i remember it, sequence databases always store the forward strand of a DNA chromsome in the 5' to 3' direction.

A gene is read in the 3' to 5' direction and so its complementary strand in the 5' to 3' direction is the same as the mRNA transcript. So i thought if a gene was on the reverse strand of a DNA molecule then the forward strand in the 5' to 3' direction gives the sequence on the corresponding mRNA (ignoring introns for simplicity).

However I'm just looking at a gene now in ensembl and this gene is described as being on the forward strand. If you look at the forward strand it contains the exact same sequence as the gene's mRNA. So to me then, if the mRNA runs on the forward strand in the 5' to 3' direction the actual gene is on the reverse strand is it not?

Is this a convention issue that I have mis-remembered? Is a gene classed as being on the forward strand if its mRNA sequence is 'on' the forward strand.

thanks for your help

sequence strand • 191k views
ADD COMMENT
129
Entering edit mode
14.1 years ago
Bio_X2Y ★ 4.4k

Sounds like you're trying to knit a few hazy concepts together too quickly - it might help to start by completely ignoring the RNA polymerase (the machinery that does the transcription in the "opposite" direction), and building up your understanding from scratch. Here is a stab at an explanation, hopefully it won't make things worse!

Start with the basics:

  • DNA is double-stranded. By convention, for a reference chromosome, one whole strand is designated the "forward strand" and the other the "reverse strand". This designation is arbitrary. Sometimes the terms "plus strand" and "minus strand" are used instead.

  • Visually (I'm not talking about the transcription machinery yet), you would typically read the sequence of a strand in the 5-3 direction. For the forward strand, this means reading left-to-right, and for the reverse strand it means right-to-left.

  • A gene can live on a DNA strand in one of two orientations. The gene is said to have a coding strand (also known as its sense strand), and a template strand (also known as its antisense strand). For 50% of genes, its coding strand will correspond to the chromosome's forward strand, and for the other 50% it will correspond to the reverse strand.

  • The mRNA (and protein) sequence of a gene corresponds to the DNA sequence as read (again, visually) from the gene's coding strand. So the mRNA sequence always corresponds to the 5-3 coding sequence of a gene.

  • Now, the RNA polymerase machinery moves along the DNA in the 5-3 orientation of the coding strand (e.g. left-to-right for a forward strand gene). It reads the bases from the template strand (so it is reading in the 3-5 direction from the point-of-view of the template strand), and builds the mRNA as it goes. This means that the mRNA matches the coding sequence of the gene, not the template sequence. (This diagram from Wikipedia illustrates).

  • Annotations such as Ensembl and UCSC are concerned with the coding sequences of genes, so when they say a gene is on the forward strand, it means the gene's coding sequence is on the forward strand. To follow through again, that means that during transcription of this forward-strand gene, the gene's template sequence is read from the reverse strand, producing an mRNA that matches the sequence on the forward strand.

ADD COMMENT
2
Entering edit mode

Also trying to get this straight in my head. @Bio_X2Y:

you mention

"the RNA polymerase machinery moves along the DNA in the 5-3 orientation of the coding strand ... and builds the cDNA as it goes."

Not trying to be obtuse here, but doesn't the RNAP make RNA which would be complementary to the template which is identical to (except U's instead of T's) the coding strand?

"This means that the cDNA matches the coding sequence of the gene, not the template sequence."

Wouldn't the cDNA then, which is complementary to the mRNA (and coding sequence), be identical to the template?

ADD REPLY
0
Entering edit mode

I thought that was exactly what i said in my question? I quote: Is a gene classed as being on the forward strand if its mRNA sequence is 'on' the forward strand.

I've just always called the coding strand the mRNA sequence. Like you say the template strand, which to me is the actual gene, is on the reverse strand. I should have been clearer wih my wording

ADD REPLY
0
Entering edit mode

forgot to say thank-you!

ADD REPLY
0
Entering edit mode

Fair enough, I guess the answer is "yes" then :) I partially used the answer to clarify things in my own head too. I can understand why you consider the template strand to be "actual" gene, but I would always have conceptualised it as the other way around (feels easier to me anyway). Things must get interesting in your view when you consider anti-sense transcription :)

ADD REPLY
0
Entering edit mode

Great answer, thanks for putting the time into crafting a detailed explanation!

ADD REPLY
0
Entering edit mode

@Bio_X2Y : Your first mention that the designation of forward and reverse strands is arbitrary. Are you sure about this ? I imagined that the forward strand was the one with the 5' end closest to centromere, no ?

ADD REPLY
0
Entering edit mode

@tony, I think you're correct - I ended up posting a follow up question here: Conventions For Designating Forward And Reverse Strands

ADD REPLY
0
Entering edit mode

Also trying to get this straight in my head. @Bio_X2Y:

you mention "the RNA polymerase machinery moves along the DNA in the 5-3 orientation of the coding strand ... and builds the cDNA as it goes."

Not trying to be obtuse here, but doesn't the RNAP make RNA which would be complementary to the template which is identical to (except U's instead of T's) the coding strand?

"This means that the cDNA matches the coding sequence of the gene, not the template sequence."

Wouldn't the cDNA, which is complementary to the mRNA (and coding sequence), be identical to the template?

ADD REPLY
0
Entering edit mode

Maybe I am misunderstanding the answer...

Quote: "the RNA polymerase machinery moves along the DNA in the 5-3 orientation of the coding strand ... and builds the cDNA as it goes." Not trying to be obtuse here, but doesn't the RNAP make RNA which would be complementary to the template? Quote: "This means that the cDNA matches the coding sequence of the gene, not the template sequence." I would have thought that the cDNA, which is complementary to the mRNA (and coding sequence), is identical to the template, and not the coding sequence. Am I thinking about this wrong?

ADD REPLY
0
Entering edit mode

For 50% of genes, its coding strand will correspond to the chromosome's forward strand, and for the other 50% it will correspond to the reverse strand.

Would you please elaborate where these numbers are coming from? And why exactly 50-50%? Thanks!

ADD REPLY
2
Entering edit mode

that statement is not correct, it is not exactly 50% - what it is trying to get at is that there is no preference for one strand vs another.

ADD REPLY
0
Entering edit mode

Amazing explanation! Probably the most concise answer to differentiate between forward/reverse (+/-), coding/template, sense/anti-sense strand conventions.

ADD REPLY
11
Entering edit mode
14.1 years ago
Neilfws 49k

Short answer - yes. A "gene" is on the forward strand if its mRNA is on the forward strand.

I placed quotation marks around "gene" because, perhaps surprisingly, it is not a very useful word. In the context of this question we're using "sequence of a gene" to mean the same as "sequence of the mRNA transcribed from the gene." So we're referring to the same strand because we're talking about the same object.

What is a gene though? Most "genes" give rise to multiple transcripts. So we might say a gene is a region of DNA that serves as a template for transcription. A gene is not really a single object with start and end. It will have a "minimum start" (the 5'-most base from which transcription occurs) and a "maximum end" (the 3'-most base at which transcription terminates). So perhaps it's best to forget about genes and think about transcripts and their properties.

I agree that the "start < end" convention takes some getting used to but it makes a lot of sense when it comes to performing range calculations with sequences. If you just name the strands "+" or "-" and set start < end, you can forget about all the other terminology (forward/reverse, 5'/3' and so on), it all "just works".

ADD COMMENT
4
Entering edit mode
2.0 years ago

I often find myself looking up what forward and reverse strands represent (along with their various synonyms) so I decided to compile some of my research in one place. Please correct me if I'm off about anything, there's a lot of divergent and database/organism-specific information out there.

  • Forward vs reverse – by convention, in Eukaryotes, the forward strand is defined as the strand with its 5' end at the tip of the short arm (p arm) of the chromosome. Databases store the sequence of the forward strand and genome browsers display it in 5' to 3' (left to right) orientation with the coordinates increasing from 1 (or 0) starting at the 5' end. The reverse strand uses the coordinates defined by the forward strand, so for the reverse strand, the 5' end has higher values than the 3' end.

5' ------X---------------------- 3' <-- forward strand, typically shown in this orientation by genome browsers
3' ------X---------------------- 5' <-- negative strand, typically not shown by genome browsers

  • Coding (information) vs noncoding – as I understand it, these are relative terms that only apply in regions of the genome that code for proteins (that is, they are not absolute terms that apply to the full chromosome-length strand of DNA). A given region of the genome is defined as coding if the pre-mRNA sequence of the gene encoded there matches the sequence (aside from Thymine/Uracil) and orientation of the forward (plus) strand. In other words, a strand is called the coding strand if the other strand acts as the template during transcription. The coding strand contains codons, the noncoding strand contains anticodons.
  • Nontemplate vs template – synonyms of coding and noncoding respectively.

  • Plus (+) vs minus (-) – synonyms of forward and reverse respectively. A gene has a plus (+) orientation if its pre-mRNA sequence matches that of the plus strand.

  • Positive vs negative – synonyms of forward and reverse respectively.

  • Sense (positive-sense) vs antisense (negative-sense) – synonyms for coding and noncoding respectively.

  • Top vs bottom – this is a convention for labeling SNPs that was invented by Illumina and was later adopted by dbSNP. It is not directly related to Fwd/Rev or (+/-). The rules for determining whether a SNP is Top/Bot are somewhat involved so I refer you to Illumina’s technical note on this.

  • Watson vs Crick – synonyms of forward and reverse respectively.

To summarize:

  • Forward = Plus = (+) = Positive = Watson
  • Reverse = Minus = (-) = Negative = Crick
  • Coding = Nontemplate = Information = Sense = Positive-sense
  • Noncoding = Template = Antisense = Negative-sense

Here are the sources I used to compile the above, in decreasing order of scholarliness:

ADD COMMENT
1
Entering edit mode

I would say that defining sense/antisense as coding/template does not bring clarity.

The same DNA region may be coding and template at the same time. Since DNA is double-stranded there is ambiguity as to which strand we refer to at any time.

I think it is best to define sense/antisense as a direction defined by the polarity of the single-stranded DNA. That direction is unambitiously defined: 5' -> 3'

DNA has a 5' and a 3' end. A transcript is defined in a 5' to 3' direction. Another sequence that matches that 5' to 3' direction will be in a sense direction, if it matches 3' to 5' will be in an antisense direction.

ADD REPLY
0
Entering edit mode
14.1 years ago

'...A gene is read in the 3' to 5' direction and so its REVERSE complementary strand in the 5' to 3' direction is the same as the mRNA transcript...'

ADD COMMENT
0
Entering edit mode

THE cDNA sequence always corresponds to the 5-3 coding sequence of a gene. You dont need to reverse it.

ADD REPLY
0
Entering edit mode

I meant the genomic sequence of the gene. See the SQL table at the UCSC, all the chromStart positions are lower than chromEnd positions whatever is the orientation.

ADD REPLY
0
Entering edit mode

I don't think i follow what your point is. Bio_x2y has confirmed for me what I thought about template and coding strands.

I think what you are talking about is another conventions issue whereby the start of a gene is always less than end of a gene even if you are on the reverse strand. On the reverse strand the start of the coding strand is higher than the end if you are using the 5' end of the forward strand as base 1. But by convention you give the gene coordinates so start is always less than the end (and flip the start and end coordinates in your transformations)

ADD REPLY
0
Entering edit mode

Personally i've always thought it looks a bit odd when you see a gene on the reverse strand and the start position of its second exon is lower than the end positon of its previous exon. I gues you just get used to reading from right to left for reverse strand stuff

ADD REPLY

Login before adding your answer.

Traffic: 1548 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6