Can A Cds Be Also An Intron?
3
2
Entering edit mode
12.7 years ago
Panos ★ 1.8k

I have a gff file and saw a portion of it that looks like this:

DS239414    GenBank    gene    3787    6375
DS239414    GenBank    mRNA    3787    6375
DS239414    GenBank    mRNA    3787    6375
DS239414    GenBank    CDS    4036    4200
DS239414    GenBank    CDS    4379    4561
DS239414    GenBank    CDS    4645    4815
DS239414    GenBank    CDS    4963    5129
DS239414    GenBank    CDS    5611    5695
DS239414    GenBank    CDS    5951    6050
DS239414    GenBank    CDS    6187    6215
DS239414    GenBank    exon    3787    4200
DS239414    GenBank    exon    4379    4561
DS239414    GenBank    exon    4645    4815
DS239414    GenBank    exon    4963    5129
DS239414    GenBank    exon    5611    5695
DS239414    GenBank    exon    5951    6050
DS239414    GenBank    exon    6187    6375
DS239414    GenBank    CDS    4036    4200
DS239414    GenBank    CDS    4379    4561
DS239414    GenBank    CDS    4645    4815
DS239414    GenBank    CDS    4963    5215
DS239414    GenBank    CDS    5577    5695
DS239414    GenBank    CDS    5951    6050
DS239414    GenBank    CDS    6187    6215
DS239414    GenBank    exon    4963    5215
DS239414    GenBank    exon    5577    5695

The first thing I understand is that there are 2 different transcripts for this gene (because there are two mRNA fields). What appears weird, however, is that in the second transcript, the first 3 CDSs appear to be introns (because of the position where the first exon starts).

Am I right? Is this possible? Or is it something else that I don't understand in the gff format?

gff intron exon cds • 6.1k views
ADD COMMENT
8
Entering edit mode
12.7 years ago

Yes it can, because of a phenomenon known as intron retention among other things. This is not the issue here, as far as I can tell. Rather it looks like a Genbank->GFF conversion that has lost the splicing information for the mRNA. What it looks like is that the mRNA coordinates are the span of the pre-mRNA, not the post-splicing mRNA itself. If you look, you can see that the gene and first/last exon starts at the same position of the "mRNA". And all CDSs are contained within exon spans.

ADD COMMENT
1
Entering edit mode

wwhere did you get the GFF?

It is incomplete by any gff standard, which must have 9 cols.

It looks like 1st 5 columns of output from bp_genbank2gff3.pl. Try the conversion yourself at http://www.hiv.lanl.gov/content/sequence/FORMAT_CONVERSION/form.html and you will get 'only differing exons'.

NCBI recently overhauled their GFF3 conversion software but the bits are not available yet, as noted here: https://groups.google.com/forum/?fromgroups#!topic/bioperl-l/TYbSSKNQZQM

http://www.ebi.ac.uk/cgi-bin/readseq.cgi gives gff2, which might serve you purpose better, depending...

ADD REPLY
0
Entering edit mode

Casey, You're right that it's a conversion error! I looked at the corresponding GenBank file and it looks like the GFF file reports only the differing exons for the second transcript. This is the reason why 5 CDSs (and not 3 as I mistakenly wrote in my question) appear to be inside introns.

ADD REPLY
0
Entering edit mode

I used bp_genbank2gff3.pl to create it from the corresponding GenBank file that I downloaded from NCBI. And yes, I used "cut" to exclude all "irrelevant" fields; I only wanted to ask you guys what was wrong with the second splice variant!

ADD REPLY
3
Entering edit mode
12.7 years ago
Dave Lunt ★ 2.0k

Panos, Genbank CDS shouldn't contain introns:

"coding region, coding sequence. CDS refers to the portion of a genomic DNA sequence that is translated, from the start codon to the stop codon, inclusively, if complete. A partial CDS lacks part of the complete CDS (it may lack either or both the start and stop codons). Successful translation of a CDS results in the synthesis of a protein."

But I'm afraid I don't know much about GFF. Could this be related to alternative splicing of exons?

ADD COMMENT
1
Entering edit mode
12.7 years ago
Hien ▴ 10

I agree with Dave and would say that this looks like a case of alternative splicing. The second CDS may cause an open reading frame shift relative to the first. Maybe you should check for stop codons in the peptide sequences, and look for supporting evidences for these models, like ESTs.

ADD COMMENT

Login before adding your answer.

Traffic: 2518 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6