Gene prediction programs predicting different genes
3
0
Entering edit mode
4.5 years ago
AP ▴ 80

Hello everyone,

In this past couple of weeks I encountered a different problem which is giving me a lot of stress, I hope you can give me some suggestions. I did RNAseq analysis in a fungal species year ago using a genome file and genes predicted from AUGUSTUS ( which were generated by my lab member). Based on that I selected a candidate gene that I found really interesting and did some benchwork to knock it out. Recently, my collegue used my RNAseq data to improve gene annotation using program called FUNANNOTATE which predicted more genes than augustus. So because we have an improved genome file and annotation file I decided to revisit my RNAseq data and found some differentially expressed genes (the numbers are greater than before may be due to new annotation file).

The candidate gene that I selected for knockout was 1.6 Kb. But from the new analysis I found that this 1.6 Kb region has two genes based on new annotation from funannotate and are 178 bp apart from each other. I tried to search if the 178 bp had promoter sequences or not but I could not find it. Should I consider this 1.6Kb region as one gene or two genes? I don't know which prediction tool to trust. Please help me what I should do.

Thank you, Ambika

genome annotation RNAseq Funannotate augustus • 2.0k views
ADD COMMENT
0
Entering edit mode

But from the new analysis I found that this 1.6 Kb region has two genes based on new annotation from funannotate and are 178 bp apart from each other.

This is a tough one. Unless there is clear experimental evidence from independent RNAseq (and/or other experiments) you should consider this a prediction.

ADD REPLY
0
Entering edit mode

Genomax, what about the fact that the 178 bp region does not have promoter sequence? Can we still consider the second part as a separate gene?

ADD REPLY
1
Entering edit mode
4.5 years ago
JC 13k

As you pointed, you are predicting genes, any prediction program has some error rate in the prediction, the only way to get the real one will be with validation. In the lab, you can validate your genes simply by RT-PCR, design some primers to validate if your sequences are connected or not, extract RNA, and do the RT-PCR. Also, 176 bp is too short to be a promoter, sound like it is an intronic region between exons, look for splicing signals.

ADD COMMENT
0
Entering edit mode

JC,

I did RT-qPCR before to validate the gene expression and I designed my primers around the 3' end so it has the amplicon only from second gene. So, now to confirm do you think I need to design primer that could amplify the region from both genes? Thanks for the suggestion, I will look for the splicing signals too.

ADD REPLY
1
Entering edit mode

Atleast you seem to have knocked out only one gene (if there are indeed two close by). Something like a RACE assay would be how you could try and distinguish if there are indeed two genes.

ADD REPLY
0
Entering edit mode

Genomax,

I will look into this RACE assay, if it would help me distinguish the genes it would be great. Thanks

ADD REPLY
1
Entering edit mode

did you do RT-PCR on you knocked strain? if you used the same primers that can also validate that is one gene and not 2

ADD REPLY
0
Entering edit mode

JC, I have not done RT-PCR on my knocked strain, howeverr this is something I can do sooner. Thanks for the suggestion!!!

ADD REPLY
1
Entering edit mode
4.5 years ago
alex.zaccaron ▴ 470

Gene prediction is not perfect, mis-annotated genes are quite common. Because fungi usually have very compact genomes, predicted genes that are merged (two genes called as one) or split (one gene called as two) usually occour. It is always a good idea for non-model organisms to manually curate genes for downstream experiments, e.g. knockout. Take a look at homologs of your gene in other species to see how they are annotated, also take a look at the predicted gene structure to spot something odd, e.g. too long or too short intron, incompatibility with RNA-seq data, etc.

ADD COMMENT
0
Entering edit mode

Alex, blasted my gene in NCBI and found that in the closely related species and the size of the gene is around 1.6 Kb too. Interesting thing is that both genes predicted from funnotate have same PFAM domain. Because of this I thought may be there is gene duplication but they are not similar.

ADD REPLY
1
Entering edit mode

Sounds like your initial prediction (Augustus) is in better accordance with homologs in other species, and therefore more likely to be correct. Regarding the two genes predicted by Funannotate, you actually would see the same PFAM domain in both; imagine that the conserved domain was "broken" into two genes. If you see good query coverage of your Augustus prediction when you BLAST, and the predicted protein sequence is about the same compared with the homologs, I would trust this annotation instead of the Funannotate one. But you should nonetheless visually inspect the prediction in a genome browser, e.g. IGV, along with a BAM file of the mapped RNA-seq reads.

ADD REPLY
0
Entering edit mode

Alex, thanks for the suggestion. will surely try this.

ADD REPLY
1
Entering edit mode
4.5 years ago
Juke34 8.9k

Map your RNAseq to your genome (with splice aware aligner e.g. hisat2 or star) and load your genome along with the annotation and the bam file in a genome browser (e.g Jbrowse). If you have reads in this region it might be easy to visually say if it is two loci merged in one.

ADD COMMENT
0
Entering edit mode

Juke, I have a bam file generated from STAR alignment. I will use this file to see if its merged or not. Thanks!!

ADD REPLY

Login before adding your answer.

Traffic: 2652 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6