Hello,
I thought I had understood the difference between the 2 terms but I am afraid I still need a clear explanation. Is the following correct?
- Exon: A sequence which remains present in a mature RNA.
- CDS: A sequence which remains present in a mature RNA and codes for a protein (i.e. gets translated).
Based on these definitions, I would expect that CDS are necessarily included in exons. Now in the UCSC online page for "Get Genomic Sequence Near Gene", I have the following (exclusive) displaying choice:
- Exons in upper case, everything else in lower case
- CDS in upper case, UTR in lower case
I would therefore expect that when I select option 2, there are less nucleotides in upper case than in option 1.
But if I compare the results for the 2 options on the same sequence, I observe the following:
- A) Entire sequences in upper case in option 1 become lower case in option 2
- B) Entire sequences in lower case in option 1 become upper case in option 2
I can understand A (part of the exons which are UTR and thus non-coding become lower case in option 2), but I don't understand at all why B also happens.
Any clue?
Thanks for your help.
Hello Adrian,
Ok so your definitions correspond to mine, i.e., CDS are included in exons.
You could be looking at something with very small Introns and large UTRs, in which case option 2 will have more lower case than option 1.
I guess option 2 should always have more (or equal) lower case bases than option 1.
Maybe the reason why you observe A and B is because your gene is not protein coding? ergo, no CDS?
But in this case, why does B happen, and everything is not simply lower case with option 2??
B) should never happen. Can you give an example gene? That would seem to be an error in the annotation (though the UCSC annotations aren't that great, use Ensembl).
Here is an example in the 1st entry of the following fasta, which goes from lower to upper case. All other parameters are the same, only the display options are different:
With option 1:
http://genome.ucsc.edu/cgi-bin/hgc?hgsid=381409833_XLyyszThcNrKOH4dtiDua1kPTT1k&g=htcDnaNearGene&i=uc002wxs.3&c=chr20&l=30946146&r=31027122&o=knownGene&boolshad.hgSeq.promoter=0&hgSeq.promoterSize=1000&hgSeq.utrExon5=on&boolshad.hgSeq.utrExon5=0&hgSeq.cdsExon=on&boolshad.hgSeq.cdsExon=0&hgSeq.utrExon3=on&boolshad.hgSeq.utrExon3=0&hgSeq.intron=on&boolshad.hgSeq.intron=0&boolshad.hgSeq.downstream=0&hgSeq.downstreamSize=1000&hgSeq.granularity=feature&hgSeq.padding5=0&hgSeq.padding3=0&hgSeq.splitCDSUTR=on&boolshad.hgSeq.splitCDSUTR=0&hgSeq.casing=cds&boolshad.hgSeq.maskRepeats=0&hgSeq.repMasking=lower&submit=submit
With option 2:
http://genome.ucsc.edu/cgi-bin/hgc?hgsid=381409833_XLyyszThcNrKOH4dtiDua1kPTT1k&g=htcDnaNearGene&i=uc002wxs.3&c=chr20&l=30946146&r=31027122&o=knownGene&boolshad.hgSeq.promoter=0&hgSeq.promoterSize=1000&hgSeq.utrExon5=on&boolshad.hgSeq.utrExon5=0&hgSeq.cdsExon=on&boolshad.hgSeq.cdsExon=0&hgSeq.utrExon3=on&boolshad.hgSeq.utrExon3=0&hgSeq.intron=on&boolshad.hgSeq.intron=0&boolshad.hgSeq.downstream=0&hgSeq.downstreamSize=1000&hgSeq.granularity=feature&hgSeq.padding5=0&hgSeq.padding3=0&hgSeq.splitCDSUTR=on&boolshad.hgSeq.splitCDSUTR=0&hgSeq.casing=exon&boolshad.hgSeq.maskRepeats=0&hgSeq.repMasking=lower&submit=submit
If you tell it to include introns and select "CDS in upper case, UTR in lower case", then the case of the introns will probably be whatever it is in the genome to begin with (upper case in the example you gave). There's no option for "CDS in upper case, everything else in lower case" as there is for exons.