Question

Transcript vs primary transcript on phytozome.

0

Entering edit mode

2.5 years ago

rijan_dhakal ▴ 10

Could someone help me understand what the difference between transcript and primary transcript on phytozme is? For example, this dataset of A.thaliana has "primary transcript CDS" vs CDS.

Off the top of my head primary transcript represents the initial strand of RNA made from DNA and transcripts represent genes post-processing of the initial strand.

The thing that is confusing me is, why is the transcript file larger than the primary transcript file? The transcript file has about 21k more headers than the "primary transcript". The only explanation I can think of is that due to alternate splicing you could end up with a bunch of isoforms that lend more headers to the transcript file. Is that it?

primary transcripts phytozome • 2.5k views

ADD COMMENT • link updated 2.5 years ago by Istvan Albert 103k • written 2.5 years ago by rijan_dhakal ▴ 10

2

Entering edit mode

primary transcript is likely one canonical transcript that is identified for each gene. You will need to confirm that that is how Phytozome is using that term. (Analogous explanation from Ensembl: https://useast.ensembl.org/info/genome/genebuild/canonical.html )

ADD REPLY • link 2.5 years ago by GenoMax 153k

0

Entering edit mode

GenoMax Unfortunately that is the issue, I cannot find a description page for this on Phytozome's side.

ADD REPLY • link 2.5 years ago by rijan_dhakal ▴ 10

1

Entering edit mode

Your best bet may be to write to their help desk and ask. Post their response to provide closure to this thread when you hear back from them.

ADD REPLY • link 2.5 years ago by GenoMax 153k

0

Entering edit mode

You are correct that the primary transcript is the initial strand of RNA synthesized from DNA, and it undergoes post-transcriptional processing to generate mature transcripts that are exported from the nucleus to the cytoplasm for translation. The mature transcripts can be alternatively spliced, resulting in multiple isoforms from a single gene.

In the context of the Phytozome database, the primary transcript CDS refers to the coding sequence of the primary transcript, which is the DNA sequence that encodes the protein product. The CDS annotation is based on computational prediction and experimental evidence, such as RNA sequencing data. The primary transcript CDS represents the annotated protein-coding gene models that are used as a reference for downstream analyses.

The "transcript" file that you are referring to likely contains all the transcript isoforms generated from the primary transcript due to alternative splicing or other post-transcriptional modifications. These isoforms may have different start and stop codons or different exon-intron structures, leading to differences in the number and length of the CDS regions. Therefore, the transcript file is larger than the primary transcript file.

In summary, the primary transcript CDS represents the reference gene model based on the primary transcript, while the transcript file contains all the transcript isoforms generated from the primary transcript due to post-transcriptional modifications.

ADD REPLY • link 2.5 years ago by iam.zahid.hussain.official ▴ 20

0

Entering edit mode

GenoMax I suspect this answer is copy-pasted from chatGPT. The phrasing and the structure are eerily similar to the answers generated by the bot and this is a new account. Not sure what platform policy on this is but either way I want to direct your attention to this.

ADD REPLY • link 2.5 years ago by rijan_dhakal ▴ 10

1

Entering edit mode

ChatGPT answers would be allowed if clearly marked as such, but are not acceptable when presented as a personal contribution.

ADD REPLY • link 2.5 years ago by Istvan Albert 103k

score 2 · Accepted Answer · 2023-03-03

Accoridng to a team lead at Phytozome:

Multi-exon genes can produce different transcripts via process of alternative splicing: inclusion/exclusion of particular exons in the spliced transcript.Designation of one of these transcripts as “primary” will vary based on the genome under consideration: the primary might be the one with the highest expression under “normal” conditions, it might be the one whose mutations was first studied, or it might simply be the transcript with the longest CDS.

In all cases, Number_of_transcripts >= Number_of_primary_transcripts Number_of_primary_transcripts == Number_of_genes