Hello everyone! Thanks for reading this post.
I am a bioinformatics beginner, now I am doing mitochondrial genome annotation. However, my mitochondrial genome is contig/scaffold level, because it is not circular.
For genome annotation, I found some papers that showed some "trans-splicing" genes, distributed into different scaffolds. I want to know how they find and annotate the gene even exon and intron distributed into different scaffolds, with only genome sequences. In their papers, they only said they corrected the genome annotation manually.
If you know how to do this, please help me, I've been stuck here for weeks. Thank you so much!
I don't know for sure, but I suspect this would be done by comparison to orthologues from other species.
Thank you for your comment very much. I am now trying to use https://www.arabidopsis.org/ to compare the CDs to the protein by BlastX
To my knowledge trans-splicing only occurs in mRNA, so I don't understand how you can annotate trans-splicing events without both a transcriptome and genome. Can you add the paper?
Thank you for your comment very much. I am sorry for my wrong expressions, I mean only DNA genome sequences.
This paper as an example: https://doi.org/10.1186/s12864-022-08993-9
In this paper, it said: "Exons of the following genes were annotated on at least two different scaffolds: nad1, nad2, nad4, nad5, nad7, and rpl2. The two exons of cox2 were annotated on scaffold 1 far apart from each other (98,582 bp in between)."
I wonder how they know it is the same gene, not a different gene with the same name.
Ah, this makes more sense. You should do some reading on how gene annotations are assigned. To oversimplify, you map transcripts to a genome and see where exons end up, and you'll often use the sequenced transcriptome input.
I don't understand what you mean by different gene with the same name. Orthologous genes often get called the same things (i.e., cox2 in this instance), and this will be based on high sequence identity between different genome assemblies. Whether they have the same function when identified over large evolutionary distances and sequence divergence is a different question but often they are assumed to have similar functions.
Thank you for your suggestions very much, I will read more articles on gene annotations. Yes, use the transcriptome to annotate, to find exons, UTRs, etc. However I've found that some papers on mitochondrial genomes use the transcriptome, and some don't. I don't know how trans-splicing can be done without transcriptome support. By "the question of whether it is the same gene," I mean a complete gene, usually with a complete structure. However, in the case of trans-splicing, if a gene called cox2 is copied, is it impossible to determine which exon is copied from which cox2?
And I also want to figure out, how they can understand the order of exons and introns, even in different scaffolds
In NCBI https://www.ncbi.nlm.nih.gov/nuccore/ON378819.1/ , the annotation showed: