What Does Cds And Exon Mean In A Gtf File?
3
3
Entering edit mode
11.7 years ago
dfernan ▴ 770

Hi,

I have a hg19 GTF file that I ordered according to chromosome start and end positions, and group by gene_id.

Here is an example of a few lines of the file:

lines 45-47

chr1  refGene transcript  367659  368597  . + . gene_id "OR4F29"; transcript_id "NM_001005221"; gene_name "OR4F29";
chr1  refGene exon  367659  368597  . + . gene_id "OR4F29"; transcript_id "NM_001005221"; exon_number "1"; exon_id "NM_001005221.1"; gene_name "OR4F29";
chr1  refGene CDS 367659  368594  . + 0 gene_id "OR4F29"; transcript_id "NM_001005221"; exon_number "1"; exon_id "NM_001005221.1"; gene_name "OR4F29";

lines 143-146

chr1  refGene transcript  861121  879961  . + . gene_id "SAMD11"; transcript_id "NM_152486"; gene_name "SAMD11";
chr1  refGene exon  861121  861180  . + . gene_id "SAMD11"; transcript_id "NM_152486"; exon_number "1"; exon_id "NM_152486.1"; gene_name "SAMD11";
chr1  refGene exon  861302  861393  . + . gene_id "SAMD11"; transcript_id "NM_152486"; exon_number "2"; exon_id "NM_152486.2"; gene_name "SAMD11";
chr1  refGene CDS 861322  861393  . + 0 gene_id "SAMD11"; transcript_id "NM_152486"; exon_number "2"; exon_id "NM_152486.2"; gene_name "SAMD11";

I was wondering what the CDS means. I mean the transcript is one isoform of the gene, the exon are the exon positions of the given isoforms, but what'd the CDS mean?

Sorry about such basic Q but I got confused.

gtf rna-seq • 29k views
ADD COMMENT
21
Entering edit mode
11.7 years ago

Don't feel bad this is not a basic question at all. The terminology is not nearly as obvious as it should be moreover the exact definitions may carry many subtle details. It is the so called Sequence Ontology that specifies the meaning of each term:

CDS: "A contiguous sequence which begins with, and includes, a start codon and ends with, and includes, a stop codon."

http://www.sequenceontology.org/browser/current_cvs/term/SO:0000316

Exon: "A region of the transcript sequence within a gene which is not removed from the primary RNA transcript by RNA splicing."

http://www.sequenceontology.org/browser/current_cvs/term/SO:0000147

ADD COMMENT
4
Entering edit mode

Ironically I have come to believe that this definition is incorrect. The CDS should not actually include the stop codon and most annotations that label a feature as CDS do not include the stop codon. Which IMHO is the correct behavior the stop codon is not actually translated into an amino acid so it is not actually coding.

ADD REPLY
7
Entering edit mode
11.7 years ago
Sangwoo Kim ▴ 440

IMHO, exon contains both of UTR and CDS. So the CDS is the sequence that actually makes proteins. In you example of SAMD11, the region upstream of 861322 is thought to be 5' UTR which is transcribed to mRNA but does not build a protein.

ADD COMMENT
1
Entering edit mode

This clarify things better. 5' UTR, CDS and 3'UTR make up exons.

ADD REPLY
1
Entering edit mode
11.7 years ago
fo3c ▴ 450

CoDing Sequence

ADD COMMENT

Login before adding your answer.

Traffic: 2475 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6