Question

How to correctly build read frames for novel splice isoforms?

0

Entering edit mode

10.4 years ago

jacobsen.jeremy ▴ 40

I'm interested in identifying potential proteins that could map to novel splice isoforms.

I have run cufflinks and I have a list of high confidence isoforms which might be novel. Now I want to determine if any of these could code for proteins. I have written code that outputs a polypeptide sequence based on the exons that cufflinks identified as belonging to said transcript. I'm pretty lost at this point because I don't have a clear understanding of how to construct my read frames. I'm hoping to explain where I am so far so that someone can tell me where I've made incorrect assumptions. Thanks.

Here's what the code does:

1-> It gets a list of potentially novel isoforms from Cuffcompare .tmap file

SLMO2-ATP5E    NR_037929    j    CUFF.72292    CUFF.72292.1    100    637.607724    628.443874    646.771574    23542.49981

2-> It gets all exons for CUFF.72292.1 from cuffcompare combined file:

chr20    Cufflinks    exon    57601521    57601524    .    -    .    gene_id "XLOC_046344"; transcript_id "TCONS_00063383"; exon_number "1"; gene_name "SLMO2-ATP5E"; oId "CUFF.72292.1"; nearest_ref "NR_037929"; class_code "j"; tss_id "TSS52060";
chr20    Cufflinks    exon    57603862    57603896    .    -    .    gene_id "XLOC_046344"; transcript_id "TCONS_00063383"; exon_number "2"; gene_name "SLMO2-ATP5E"; oId "CUFF.72292.1"; nearest_ref "NR_037929"; class_code "j"; tss_id "TSS52060";
chr20    Cufflinks    exon    57605358    57605484    .    -    .    gene_id "XLOC_046344"; transcript_id "TCONS_00063383"; exon_number "3"; gene_name "SLMO2-ATP5E"; oId "CUFF.72292.1"; nearest_ref "NR_037929"; class_code "j"; tss_id "TSS52060";
chr20    Cufflinks    exon    57607275    57607422    .    -    .    gene_id "XLOC_046344"; transcript_id "TCONS_00063383"; exon_number "4"; gene_name "SLMO2-ATP5E"; oId "CUFF.72292.1"; nearest_ref "NR_037929"; class_code "j"; tss_id "TSS52060";

Here's where I'm confused..

3->Based on the strand, it grabs each exon DNA sequence from the chromosome fasta file, combines them, and constructs three peptides (one for each frame):

(-)Frame_0:                                        
[CGAEKAKTPD*KDADLAGRLGCNGRRTAKPGCSRRKRCRTTG*PLSDLSRCRL*GSRHVFVTLYVTSVLSFVYDSSEDRRCIFNTFISSLLDGTDFELYDVKVP]                                        
(-)Frame_1:                                        
[AGRRRRRHQTRRTPTWRADSAVTAAEPLSRAARGESDVVPPDDLCPT*VDVGYEGLDTFSSLST*LLS*VSFTTLLKTVVAFLTLSFLPY*MGLISNFTM*RF]                                        
(-)Frame_2:                                        
[RGGEGEDTRLEGRRLGGPTRL*RPQNR*AGLLEAKAMSYHRMTSVRPESM*AMRV*TRFRHSLRDFCLKFRLRLF*RPSLHF*HFHFFLIRWD*FRTLRCKGS]

Reasons for confusion:

I am unsure whether it was correct to build a read frame from the entire sequence (connecting exons head to tail), as opposed to each exon individually (before concatenation).
I am unsure whether a transcript can change read frames from exon to exon during splicing as this would very much complicate things.
I'm not certain about whether a read frame is always contained entirely within the AG-GU boundaries. In other words, is it possible for the G on either side to be included in the frame?
For protein inference, can there exist a methionine in addition to the start site or is this invalid? For instance: MKPGCSRRKRCRTTG* (valid?), MKPGCSRMKRCRTTG* (invalid?)

Thanks!

-Jeremy

RNA-Seq Assembly • 2.6k views

ADD COMMENT • link updated 2.8 years ago by Ram 44k • written 10.4 years ago by jacobsen.jeremy ▴ 40

Ram · Answer 1 · 2014-06-23

1

Entering edit mode

10.4 years ago

Devon Ryan 104k

There's no such thing as frame 0, you mean 1 there.

Concatenate first, then translate (I'm guessing you don't have a biology background).
See above
The acceptor and donor sites are part of the intron, so they wouldn't normally be included (I'm sure someone has found an exception...biology is messy like that).
A protein can, and typically will, have more than one methionine.

You might find a local biologist to help you out with things.

ADD COMMENT • link 10.4 years ago by Devon Ryan 104k

0

Entering edit mode

Thanks Devon!

ADD REPLY • link updated 3.1 years ago by Ram 44k • written 10.4 years ago by jacobsen.jeremy ▴ 40