Question

Identify transcripts code for longest protein from gene annotation file

1

Entering edit mode

5.7 years ago

waqaskhokhar999 ▴ 160

I have reference annotation file of Arabidopsis thaliana and I am interested to identify extract transcipts that code for longest protein isoform and then extract coodinates of that transcript. Forexample gene (AT1G01020) contain 6 transcripts (AT1G01020.1, AT1G01020.2, AT1G01020.3, AT1G01020.4, AT1G01020.5, AT1G01020.6), how can i identify transcript which codes for longest protein and extract its coordinates?

The reference annotation file

Does it depends on number of exons, CDS regions or length of exons?

RNA-Seq Reference annotation file • 2.0k views

ADD COMMENT • link updated 5.7 years ago by JC 13k • written 5.7 years ago by waqaskhokhar999 ▴ 160

score 0 · Answer 1 · 2019-08-29

0

Entering edit mode

5.7 years ago

JC 13k

Use Arabidopsis in BioMart to filter by "Gene stable ID" for your gene, select the "Structures" in "Attributes" and retrieve the values you need.

ADD COMMENT • link 5.7 years ago by JC 13k

0

Entering edit mode

I am amble to select the protein coding transcripts but how I can select the transcrip that codes for longest protein? Seondly if multiple transcipts of variable length code for protein of similar length then which transcript should I need to select? For example gene (AT2G27490) conatin 4 transcripts of variable length but all codes for protein of 232aa so which one I need to select?

ADD REPLY • link 5.7 years ago by waqaskhokhar999 ▴ 160

0

Entering edit mode

You select the larger one from the table, if you need to automatically decide, then you need to code something to query and filter your selection. Deciding which one to use if they have the same length, that is a question you need to define based on what are you trying to do with that information.

ADD REPLY • link 5.7 years ago by JC 13k

0

Entering edit mode

Longest transcript doesn't mean it codes for longest protein as it can aslo contain retained introns or part of introns, how can i get the idea of longest protein coding transcript?