Hi,
I have a hg19 GTF file that I ordered according to chromosome start and end positions, and group by gene_id.
Here is an example of a few lines of the file:
lines 45-47
chr1 refGene transcript 367659 368597 . + . gene_id "OR4F29"; transcript_id "NM_001005221"; gene_name "OR4F29";
chr1 refGene exon 367659 368597 . + . gene_id "OR4F29"; transcript_id "NM_001005221"; exon_number "1"; exon_id "NM_001005221.1"; gene_name "OR4F29";
chr1 refGene CDS 367659 368594 . + 0 gene_id "OR4F29"; transcript_id "NM_001005221"; exon_number "1"; exon_id "NM_001005221.1"; gene_name "OR4F29";
lines 143-146
chr1 refGene transcript 861121 879961 . + . gene_id "SAMD11"; transcript_id "NM_152486"; gene_name "SAMD11";
chr1 refGene exon 861121 861180 . + . gene_id "SAMD11"; transcript_id "NM_152486"; exon_number "1"; exon_id "NM_152486.1"; gene_name "SAMD11";
chr1 refGene exon 861302 861393 . + . gene_id "SAMD11"; transcript_id "NM_152486"; exon_number "2"; exon_id "NM_152486.2"; gene_name "SAMD11";
chr1 refGene CDS 861322 861393 . + 0 gene_id "SAMD11"; transcript_id "NM_152486"; exon_number "2"; exon_id "NM_152486.2"; gene_name "SAMD11";
I was wondering what the CDS means. I mean the transcript is one isoform of the gene, the exon are the exon positions of the given isoforms, but what'd the CDS mean?
Sorry about such basic Q but I got confused.
Ironically I have come to believe that this definition is incorrect. The CDS should not actually include the stop codon and most annotations that label a feature as CDS do not include the stop codon. Which IMHO is the correct behavior the stop codon is not actually translated into an amino acid so it is not actually coding.