i started working on est assembly and annotation of a plant species some months back, till that time complete genome sequence was not available. Now the species has its genome sequence sequenced and annotated and is in list of refgenomes. By the time i also finished EST pre-processing and assembly. Now can i still use my EST assembly data for gene finding (in the species genome sequence itself) so as to find some additional information as the genome has been annotated using homology based searches and ab initio methods? will it be relevant? can i go for mapping of est contigs and singletons on the genome, if yes what are tools available for this.
How large is the genome of your plant species? If not too large, you may be able to give Maker a go. Alternatively if the species is of interest to Ensembl Plants, they may be able to provide the annotation of that genome. Check it out with their helpdesk.
genome size is nearly 15 Gb. The annotation report is available at NCBI and they have used EST data also for this. My question is how to use the EST assembly for finding something new and relevant in this case.
My guess, but ESTs don't sample enough unlike RNA-Seq particularly for a not so well studied organism unless many sources of ESTs exist under a variety of conditions. As the annotation is done using ESTs finding new transcribed regions should be rare. But you can try mapping the ests to the genome using any spliceaware mapper or a commandline blat and then use bedtools/samtools to get non-annotated regions which have coverage.
Ya, that sounds reasonable and relevant. Thanks a lot.
Blat would be a good starting option. It can do spliced alignments.