I am trying to perform gene prediction after a de-novo assembly of dna-seq reads (E. Coli).
After producing the scaffolds I used bowtie2 to map ESTs (random ones from E. Coli) on the scaffolds. Thus I end up with sam/bam files that contain the alignments of the evidence-based data (e.g ESTs) to the scaffolds. My goal is to identify gene regions on the scaffolds.
The all time classic paper A beginner’s guide to eukaryotic genome annotation suggests to cluster the alignments in order to identify overlapping alignments and predictions. Any practical idea of how do I do that?
Thanks
PS1: I would prefer either a) any ideas of manual approach (simple steps) or b) python/BASH-based toolkits
PS2: An overview of the SAM file (alignments):
@HD VN:1.0 SO:unsorted @SQ SN:scaffold1|size105789 LN:105789 @SQ SN:scaffold2|size142352 LN:142352 @SQ SN:scaffold3|size57540 LN:57540 .... @SQ SN:scaffold132|size37 LN:37 @PG ID:bowtie2 PN:bowtie2 VN:2.3.3 CL:"/usr/bin/bowtie2-align-s--wrapper basic-0 -f -x SRR001665_scaffolds -S SRR001665_on_scaffolds.sam -U ESTS/seven_ests.fasta"
gi|14475471|gb|BI067949.1| 4 * 0 0 * * 0 0 AGTGTATGATGGTGTTTTTGAGGTGCTCCAGTGGCTTCTGTTTCTATCNNCTGTCCCTCCTGTTCAGCTACTGACGGGGTGGTGCGTAACGGCAAAAGCACCGCCGGACATCAGCGCTATCTCTGCTCTCACTGCCGTAAAACATGGCAACTGCAGTTCACTTACACCGCTTCTCAACCCGGTACGCACCAGAAAATCATTGATATGGCCATGAATGGCGTTGGATGCCGGGCAACAGCCCGCATTATGGGCGTTGGCCTCAACACGATTTTACGTCACTTAAAAAACTCAGGCCGCAGTCGGTAACCTCGCGCATACAGCCGGGCAGTGACGTCATCGTCTGCGCGGAAATGGACGAACAGTGGGGCTATGTCGGGGCTAAATCGCGCCAGCGCTGGCTGTTTTACGCGTATGACAGTCTCCGGAAGACGGTTGTTGCGCACGTATTCGGTGAACGCACTATGGCGACGCTGGGGCGTCTTATGAGCCTGCTGTCACCCTTTGACGTGGTGATATGGATGACGGATGGCTGGCCGCTGTATGAATCCCGCCTGAAGGGAAAGCTGCACGTAATCAGCAAGCGATATACGCAGCGAATTGAGCGGCATAACCTGAATCTGAGGCAGCACCTNNNNCGNNN IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII YT:Z:UU
gi|14007620|gb|BG713670.1| 0 scaffold38|size43565 37568 42 629M * 0 0 ACACAAAGAAAAATTGAATAAACTGTATGATTTAAAAGATTATCGGGAGAGTTACCTCCCGATATAAAAGGAAGGATTTACAGAATGTGACCTAAGGTCTGGCGTAAATGTGCACCGGAACCGAGAAGGCCCGGATTGTCATGGACGATGAGATACACCGGAATATCATGGACATATTCTTTAAAGCGCCCTTTATCTTCAAATGCGGCACGGAAACCGGAGGCTTTGAAGAACTCAAGGAAGCGCGGCACGATACCGCCCGCAATAAACACGCCGCCAAATGTCCCGAGATTGAGCGCCAGATTGCCGCCAAAACGGCCCATAATGACGCAAAACAGCGACAATGCGCGGCGGCAATCGGTGCAGCTGTCAGCCAGCGCGCGTTCGGTAATATCTTTTGGCTTGAGATTTTCTGGCAGGCGGTTGTCAGCTTTCACAATTGCGCGATACAAATTCACCAGCCCAGGGCCAGAAAGCACGCGCTCCGCCGAAACATGACCAATTTCCGCACGCAATATTTCGAGGATAATGGCCTCTTCTTCACTATTCGGCGCAAAATCAACGTGACCGCCTTCGCCTGGCAAGCTTACCCAACGCTTATCGACATGGACCNNNTGCGCAACCCCAAC IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:-9 XN:i:0 XM:i:4 XO:i:0 XG:i:0 NM:i:4 MD:Z:612A0G0A13G0 YT:Z:UU
gi|14007330|gb|BG713380.1| 16 scaffold21|size132647 11225 40 484M * 0 0 GGTTGGCTGGGGGTATTCTTGCCCGGGTCNNATACGTCATCTAACGCCCTGTTCGCCGCGCTGCAAGCCGCCGCAGCTCANCAAATTGGCGTCTCTGATCTGTTGNNGGTTGCCGCCAATACCACCGGTGGCGTCGCCGGTAAGATGATCTCCCCGCAATCTATCGCTATCGCCTGTACGGCGGTAGGCCTGGTGGGCAAAGAGTNNGATTTGTTCCGCTTTACTGTCAAACACAGCCTGATCTTCACCTGTATAGTGGGCGTGATCACCACGCTTCAGGCTTATGTCTTAACGTGGATGATTCCTTAATGATTGTTTTACCCAGACGCCTGTCAGACGAGGTTGCCGATCGTGTGCGGGCNNNNNNTGATGAAAAAAACCTGTAAGCGGGCATGAAGTTGCCCGCTGAGCGCCAACTGGNTATGCAACTCGGCGTATCACGTCATTCACTGCGCGAGGCGCTGGCAAAACTGGTGNNNGAAGG IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:-83 XN:i:0 XM:i:28 XO:i:0 XG:i:0 NM:i:28 MD:Z:2C15C2A7G0G4C33A7A2A24C0T28A41G27C0T154G0C0T0G0A0T16G36C0G21A32A0G0T5 YT:Z:UU
gi|14007281|gb|BG713331.1| 16 scaffold21|size132647 11064 42 645M * 0 0 NCNNNNNCGGCAGCACGCTGAAAGAACTGNCTCTGCCCATCTACTCCATCGGTATGGTGCTGGCATTCGCCTTTATTTCGAACTATTCCGGACTGTCATCAACACTGGCGCTGGCACTGGCGCACACCGGTCATGCATTCACCTTCTTCTCGCCGTTCCTCGGCTGGCTGGGGGTATTCCTGACCGGGTCGGATACCTCATCTAACGCCCTGTTCGCCGCGCTGCAAGCCACCGCAGCACAACAAATTGGCGTCTCTGATCTGTTGCTGGTTGCCGCCAATACCACCGGTGGCGTCACCGGTAAGATGATCTCCCCGCAATCTATCGCTATCGCCTGTGCGGCGGTAGGCCTGGTGGGCAAAGAGTCTGATTTGTTCCGCTTTACTGTCAAACACAGCCTGATCTTCACCTGTATAGTGGGCGTGATCACCACGCTTCAGGCTTATGTCTTAACGTGGATGATTCCTTAATGATTGTTTTACCCAGACGCCTGTCAGACGAGGTTGCCGATCGTGTGCGGGCGCTGATTGATGAAAAAAACCTGGAAGCGGGCATGAAGTTGCCCGCTGAGCGCCAACTGGCGATGCAACTCGGCGTATCACGTAATTCACTGCGCGAGGCGCTGGCAAAACTGGTGAGTGAAGG IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:-7 XN:i:0 XM:i:7 XO:i:0 XG:i:0 NM:i:7 MD:Z:0G1A0C0C0T0T22G615 YT:Z:UU
gi|14006980|gb|BG713030.1| 0 scaffold38|size43565 24794 42 449M * 0 0 TGCGATACAACAATTCGTATCTACAGAAGGTAACTATGTTTCCACAATGCAAATTTTCCCGCGAGTTTCTACATCCTCGCTACTGGCTCACATGGTTTGGGCTTGGTGTACTCTGGCTTTGGGTACAGCTTCCTTATCCTGTTCTCTGCTTTCTCGGCACGCGTATTGGCGCAATGGCGCGACCATTCCTGAAACGTCGTGAATCTATCGCCCGTAAAAACCTGGAACTTTGTTTCCCGCAGCATTCTGCGGAAGAACGCGAGAAGATGATTGCCGAAAACTTTCGTTCACTCGGCATGGCGCTGGTAGAAACCGGCATGGCATGGTTCTGGCCCGACAGTCGCGTACGTAAATGGTTTGATGTTGAAGGGTTGGATAACCTTAAACGCGCACAAATGCAAAATCGCGGCGTAATGGTTGTCGGCGTCCATTTTATGTCGCTGGAACTG IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:449 YT:Z:UU
gi|14006658|gb|BG712708.1| 0 scaffold38|size43565 24794 42 417M1I32M * 0 0 TGCGATACAACAATTCGTATCTACAGAAGGTAACTATGTTTCCACAATGCAAATTTTCCCGCGAGTTTCTACATCCTCGCTACTGGCTCACATGGTTTGGGCTTGGTGTACTCTGGCTTTGGGTACAGCTTCCTTATCCTGTTCTCTGCTTTCTCGGCACGCGTATTGGCGCAATGGCGCGACCATTCCTGAAACGTCGTGAATCTATCGCCCGTAAAAACCTGGAACTTTGTTTCCCGCAGCATTCTGCGGAAGAACGCGAGAAGATGATTGCCGAAAACTTTCGTTCACTCGGCATGGCGCTGGTAGAAACCGGCATGGCATGGTTCTGGCCCGACAGTCGCGTACGTAAATGGTTTGATGTTGAAGGGTTGGATAACCTTAAACGCGCACAAATGCAAAATCGCGGCGTAATGGNNNGTCGGCGTCCATTTTATGTCGCTGGAACTG IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:-10 XN:i:0 XM:i:2 XO:i:1 XG:i:1 NM:i:3 MD:Z:417T0T30 YT:Z:UU gi|14004118|gb|BG710168.1| 0 scaffold38|size43565 24794 42 449M * 0 0 TGCGATACAACAATTCGTATCTACAGAAGGTAACTATGTTTCCACAATGCAAATTTTCCCGCGAGTTTCTACATCCTCGCTACTGGCTCACATGGTTTGGGCTTGGTGTACTCTGGCTTTGGGTACAGCTTCCTTATCCTGTTCTCTGCTTTCTCGGCACGCGTATTGGCGCAATGGCGCGACCATTCCTGAAACGTCGTGAATCTATCGCCCGTAAAAACCTGGAACTTTGTTTCCCGCAGCATTCTGCGGAAGAACGCGAGAAGATGATTGCCGAAAACTTTCGTTCACTCGGCATGGCGCTGGTAGAAACCGGCATGGCATGGTTCTGGCCCGACAGTCGCGTACGTAAATGGTTTGATGTTGAAGGGTTGGATAACCTTAAACGCGCACAAATGCAAAATCGCGGCGTAATGGTTGTCGGCGTCCATTTTATGTCGCTGGAACTG IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:449 YT:Z:UU
gi|226767304|gb|GO523315.1| 4 * 0 0 * * 0 0 ACTGGGGAAACCTTGCAGTTACGGAACTTAAACGCCTGGCAGCACGTGCCCCTTTCAGCACCTGGCGTAATCCGGAAGAGGCCCGCACCAATCGCCCTTCCAACGTGATGCGCAGCCTGAATGGTCAATGGGACT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII YT:Z:UU
gi|209377782|gb|GE310270.1| 4 * 0 0 * * 0 0 AGTTGTAGTTTTTCAACTCATAGATGAGCACTACCCCTTTTGGGGGTTAATCACAAGTTTATCACCGATTGATGGCCCTTAAAGGGGGATTTCTTCTGGAGTTTCCCCTTCACCTGATTTGCAGGAAAGTAAATCACCGCTTTCACAACAGTGACCCACTACTACACACTAAACAACTGGTAAATCTTTTTAAGAGGATTGATCTTAACCAAGCTTAACAATCTTAATTTAATGCTAGGCACCATAGAGTGATGGTCTAGTTATATCATTTAAACCTGAATTAACTTTAACAAATTGAAAGCCTGGCTCCTCATGAGACTAGTTCTTTGTGCTAACCATATCTACTATTTCACATAGTAGAATACCTGAGTTTGCTACTAGGAATGTTCCTGGCTCAATTTCAAGTTTTAAATTTCTTTGATTTTACTGGTTGAATTCATTGATCTTATTTACTGT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII YT:Z:UU
You could use a tool like Prokka to do gene identification/annotation easily. NCBI also makes their prokaryotic annotation pipeline available, assuming you will be making this data public at some point.
Thank you for your reply. Unfortunately this tool isn't suitable for me for several reasons: a) too complicated for what I want, b) doesn't have proper documentation, c) written in bioperl which means it will be difficult for me to integrate it in my python/bash based pipeline. I am more interested in a manual approach (simple steps) to gene identification. However if you could suggest a python-based equivalent tool or a simpler tool I could take an inside look at it. Thanks again :) I have updated my question to clarify this.
prodigal does an excellent job in predicting genes. I would suggest running cmscan (RFAM) on top of that. That's basically what prokka does (+ some other things). prodigal is standalone. You could use RNAseq data to enhance the computational predictions, you can map the reads->bed file->merge bed using bedtools.
Thank you Asaf, prodigal seems a very good choice! I am most probably going to use that. Do you have in mind anything equivalent to propose for eykariotic genomes?
Unfortunately no. I'm not aware of a good public tool for eukaryotes.
Indeed things in eukaryotes are more complicated, thus there isn't a simple supervised learning tool like prodigal. Moreover as a future reference, for people coming up to this thread, I was wrong before about lack of documentation in PROKKA, since I just found an external source http://metagenomics-workshop.readthedocs.io/en/latest/annotation/index.html which is a tutorial (steps) for annotation.
A final question that I have as regards the use of rna-seq data to enhance the predictions: As I understand you mean to map the reads to the scaffolds and then combine somehow the two separate results (1: genes from prodigal, 2: alignments), using bedtools? I am asking because the only way that I know to combine evidence-based data with ab-initio results is to feed the ab-initio predictor with the evidence-based data at runtime (something that prodigal doesn't seem to be capable of). Thanks again!
Prodigal will give you the coding regions while the RNA-seq results will give you the transcripts, including ncRNAs. If you'll combine them you'll get the genes with the UTRs. I'm not aware of a tool that does this combination neatly, bedtools might be a good tool to get the ORFs from prodigal aligned with the experimental transcripts.