Question

Genome annotation (proving evidance from the RNA-seq raw reads)

0

Entering edit mode

15 months ago

BenAawf ▴ 10

Dear All, I have this question but need help answering it using the technical process (From mapping to quantification). I recently annotated a genome of eukaryotic species. So, After combining three methods using EvidanceModeler. The annotation of protein-coding genes yielded a total of 30000 genes. So, I was asked to provide how many of those genes were supported by RNA-seq raw data. Considering that the RNAseq raw data from seq was generated years ago from different tissues from different individuals than the one I assembled and annotated, but all from the same species.

By mapping the raw reads to the CDS sequences of the assembled genome and quantifying the abundance of the raw reads with respect to each CDS sequence, I can find a way to respond to this question. But I still need to think about this approach. Is there any other approach or standardized method to reply to this question?

I really appreciate any help you can provide.

RNAseq genome annotation • 685 views

ADD COMMENT • link updated 15 months ago by GenoMax 149k • written 15 months ago by BenAawf ▴ 10

score 1 · Answer 1 · 2023-11-20

1

Entering edit mode

15 months ago

GenoMax 149k

Predictions made by bioinformatic analysis are just that until proven with an independent experimental method. So showing the evidence that the gene is actually expressed or creating a knockout/knockdown to show phenotypic effect would be one way to prove that. You could also show homology to other published proteins that are experimentally proven to exist/work.

Keep in my mind, assuming those 30K genes are real, some may be expressed in a very specific cell, at specific time in development, so you may never see them in set of experiments.

ADD COMMENT • link 15 months ago by GenoMax 149k

0

Entering edit mode

Thanks, GenoMax , From the bioinformatics perspective, did you find my approach logic to broadly answering this request?

ADD REPLY • link 15 months ago by BenAawf ▴ 10

0

Entering edit mode

By mapping the raw reads to the CDS sequences of the assembled genome and quantifying the abundance of the raw reads with respect to each CDS sequence, I can find a way to respond to this question.

I assume you are referring to using old RNAseq data? You should align the reads to entire genome and then see where the reads align and how well they support your gene predictions. Again, a negative result would not mean the prediction is wrong but if the reads align to parts of genome where you did not predict a coding sequence then you will need to check your predictions in that region.

ADD REPLY • link 15 months ago by GenoMax 149k