From evidence-based alignments on de-novo assembly, to gene identification
0
0
Entering edit mode
7.2 years ago
chefarov ▴ 170

I am trying to perform gene prediction after a de-novo assembly of dna-seq reads (E. Coli).

After producing the scaffolds I used bowtie2 to map ESTs (random ones from E. Coli) on the scaffolds. Thus I end up with sam/bam files that contain the alignments of the evidence-based data (e.g ESTs) to the scaffolds. My goal is to identify gene regions on the scaffolds.

The all time classic paper A beginner’s guide to eukaryotic genome annotation suggests to cluster the alignments in order to identify overlapping alignments and predictions. Any practical idea of how do I do that?

Thanks

PS1: I would prefer either a) any ideas of manual approach (simple steps) or b) python/BASH-based toolkits

PS2: An overview of the SAM file (alignments):

  @HD   VN:1.0  SO:unsorted @SQ SN:scaffold1|size105789 LN:105789 @SQ   SN:scaffold2|size142352 LN:142352 @SQ   SN:scaffold3|size57540  LN:57540 .... @SQ   SN:scaffold132|size37   LN:37 @PG   ID:bowtie2  PN:bowtie2  VN:2.3.3    CL:"/usr/bin/bowtie2-align-s--wrapper basic-0 -f -x SRR001665_scaffolds -S SRR001665_on_scaffolds.sam -U ESTS/seven_ests.fasta" 
  gi|14475471|gb|BI067949.1|    4   *   0   0   *   *   0   0   AGTGTATGATGGTGTTTTTGAGGTGCTCCAGTGGCTTCTGTTTCTATCNNCTGTCCCTCCTGTTCAGCTACTGACGGGGTGGTGCGTAACGGCAAAAGCACCGCCGGACATCAGCGCTATCTCTGCTCTCACTGCCGTAAAACATGGCAACTGCAGTTCACTTACACCGCTTCTCAACCCGGTACGCACCAGAAAATCATTGATATGGCCATGAATGGCGTTGGATGCCGGGCAACAGCCCGCATTATGGGCGTTGGCCTCAACACGATTTTACGTCACTTAAAAAACTCAGGCCGCAGTCGGTAACCTCGCGCATACAGCCGGGCAGTGACGTCATCGTCTGCGCGGAAATGGACGAACAGTGGGGCTATGTCGGGGCTAAATCGCGCCAGCGCTGGCTGTTTTACGCGTATGACAGTCTCCGGAAGACGGTTGTTGCGCACGTATTCGGTGAACGCACTATGGCGACGCTGGGGCGTCTTATGAGCCTGCTGTCACCCTTTGACGTGGTGATATGGATGACGGATGGCTGGCCGCTGTATGAATCCCGCCTGAAGGGAAAGCTGCACGTAATCAGCAAGCGATATACGCAGCGAATTGAGCGGCATAACCTGAATCTGAGGCAGCACCTNNNNCGNNN    IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII    YT:Z:UU 
  gi|14007620|gb|BG713670.1|    0   scaffold38|size43565    37568   42  629M    *   0   0 ACACAAAGAAAAATTGAATAAACTGTATGATTTAAAAGATTATCGGGAGAGTTACCTCCCGATATAAAAGGAAGGATTTACAGAATGTGACCTAAGGTCTGGCGTAAATGTGCACCGGAACCGAGAAGGCCCGGATTGTCATGGACGATGAGATACACCGGAATATCATGGACATATTCTTTAAAGCGCCCTTTATCTTCAAATGCGGCACGGAAACCGGAGGCTTTGAAGAACTCAAGGAAGCGCGGCACGATACCGCCCGCAATAAACACGCCGCCAAATGTCCCGAGATTGAGCGCCAGATTGCCGCCAAAACGGCCCATAATGACGCAAAACAGCGACAATGCGCGGCGGCAATCGGTGCAGCTGTCAGCCAGCGCGCGTTCGGTAATATCTTTTGGCTTGAGATTTTCTGGCAGGCGGTTGTCAGCTTTCACAATTGCGCGATACAAATTCACCAGCCCAGGGCCAGAAAGCACGCGCTCCGCCGAAACATGACCAATTTCCGCACGCAATATTTCGAGGATAATGGCCTCTTCTTCACTATTCGGCGCAAAATCAACGTGACCGCCTTCGCCTGGCAAGCTTACCCAACGCTTATCGACATGGACCNNNTGCGCAACCCCAAC IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII   AS:i:-9 XN:i:0  XM:i:4  XO:i:0  XG:i:0  NM:i:4  MD:Z:612A0G0A13G0   YT:Z:UU 
  gi|14007330|gb|BG713380.1|    16  scaffold21|size132647   11225   40  484M    *   0   0   GGTTGGCTGGGGGTATTCTTGCCCGGGTCNNATACGTCATCTAACGCCCTGTTCGCCGCGCTGCAAGCCGCCGCAGCTCANCAAATTGGCGTCTCTGATCTGTTGNNGGTTGCCGCCAATACCACCGGTGGCGTCGCCGGTAAGATGATCTCCCCGCAATCTATCGCTATCGCCTGTACGGCGGTAGGCCTGGTGGGCAAAGAGTNNGATTTGTTCCGCTTTACTGTCAAACACAGCCTGATCTTCACCTGTATAGTGGGCGTGATCACCACGCTTCAGGCTTATGTCTTAACGTGGATGATTCCTTAATGATTGTTTTACCCAGACGCCTGTCAGACGAGGTTGCCGATCGTGTGCGGGCNNNNNNTGATGAAAAAAACCTGTAAGCGGGCATGAAGTTGCCCGCTGAGCGCCAACTGGNTATGCAACTCGGCGTATCACGTCATTCACTGCGCGAGGCGCTGGCAAAACTGGTGNNNGAAGG    IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII    AS:i:-83    XN:i:0  XM:i:28 XO:i:0  XG:i:0  NM:i:28 MD:Z:2C15C2A7G0G4C33A7A2A24C0T28A41G27C0T154G0C0T0G0A0T16G36C0G21A32A0G0T5  YT:Z:UU 
  gi|14007281|gb|BG713331.1|    16  scaffold21|size132647   11064   42  645M    *   0   0   NCNNNNNCGGCAGCACGCTGAAAGAACTGNCTCTGCCCATCTACTCCATCGGTATGGTGCTGGCATTCGCCTTTATTTCGAACTATTCCGGACTGTCATCAACACTGGCGCTGGCACTGGCGCACACCGGTCATGCATTCACCTTCTTCTCGCCGTTCCTCGGCTGGCTGGGGGTATTCCTGACCGGGTCGGATACCTCATCTAACGCCCTGTTCGCCGCGCTGCAAGCCACCGCAGCACAACAAATTGGCGTCTCTGATCTGTTGCTGGTTGCCGCCAATACCACCGGTGGCGTCACCGGTAAGATGATCTCCCCGCAATCTATCGCTATCGCCTGTGCGGCGGTAGGCCTGGTGGGCAAAGAGTCTGATTTGTTCCGCTTTACTGTCAAACACAGCCTGATCTTCACCTGTATAGTGGGCGTGATCACCACGCTTCAGGCTTATGTCTTAACGTGGATGATTCCTTAATGATTGTTTTACCCAGACGCCTGTCAGACGAGGTTGCCGATCGTGTGCGGGCGCTGATTGATGAAAAAAACCTGGAAGCGGGCATGAAGTTGCCCGCTGAGCGCCAACTGGCGATGCAACTCGGCGTATCACGTAATTCACTGCGCGAGGCGCTGGCAAAACTGGTGAGTGAAGG   IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII   AS:i:-7 XN:i:0  XM:i:7  XO:i:0  XG:i:0  NM:i:7  MD:Z:0G1A0C0C0T0T22G615 YT:Z:UU 
  gi|14006980|gb|BG713030.1|    0   scaffold38|size43565    24794   42  449M    *   0   0   TGCGATACAACAATTCGTATCTACAGAAGGTAACTATGTTTCCACAATGCAAATTTTCCCGCGAGTTTCTACATCCTCGCTACTGGCTCACATGGTTTGGGCTTGGTGTACTCTGGCTTTGGGTACAGCTTCCTTATCCTGTTCTCTGCTTTCTCGGCACGCGTATTGGCGCAATGGCGCGACCATTCCTGAAACGTCGTGAATCTATCGCCCGTAAAAACCTGGAACTTTGTTTCCCGCAGCATTCTGCGGAAGAACGCGAGAAGATGATTGCCGAAAACTTTCGTTCACTCGGCATGGCGCTGGTAGAAACCGGCATGGCATGGTTCTGGCCCGACAGTCGCGTACGTAAATGGTTTGATGTTGAAGGGTTGGATAACCTTAAACGCGCACAAATGCAAAATCGCGGCGTAATGGTTGTCGGCGTCCATTTTATGTCGCTGGAACTG   IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII   AS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:449    YT:Z:UU 
  gi|14006658|gb|BG712708.1|    0   scaffold38|size43565    24794   42  417M1I32M   *   0   0   TGCGATACAACAATTCGTATCTACAGAAGGTAACTATGTTTCCACAATGCAAATTTTCCCGCGAGTTTCTACATCCTCGCTACTGGCTCACATGGTTTGGGCTTGGTGTACTCTGGCTTTGGGTACAGCTTCCTTATCCTGTTCTCTGCTTTCTCGGCACGCGTATTGGCGCAATGGCGCGACCATTCCTGAAACGTCGTGAATCTATCGCCCGTAAAAACCTGGAACTTTGTTTCCCGCAGCATTCTGCGGAAGAACGCGAGAAGATGATTGCCGAAAACTTTCGTTCACTCGGCATGGCGCTGGTAGAAACCGGCATGGCATGGTTCTGGCCCGACAGTCGCGTACGTAAATGGTTTGATGTTGAAGGGTTGGATAACCTTAAACGCGCACAAATGCAAAATCGCGGCGTAATGGNNNGTCGGCGTCCATTTTATGTCGCTGGAACTG  IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII  AS:i:-10    XN:i:0  XM:i:2  XO:i:1  XG:i:1  NM:i:3  MD:Z:417T0T30   YT:Z:UU gi|14004118|gb|BG710168.1|  0   scaffold38|size43565    24794   42  449M    *   0   0   TGCGATACAACAATTCGTATCTACAGAAGGTAACTATGTTTCCACAATGCAAATTTTCCCGCGAGTTTCTACATCCTCGCTACTGGCTCACATGGTTTGGGCTTGGTGTACTCTGGCTTTGGGTACAGCTTCCTTATCCTGTTCTCTGCTTTCTCGGCACGCGTATTGGCGCAATGGCGCGACCATTCCTGAAACGTCGTGAATCTATCGCCCGTAAAAACCTGGAACTTTGTTTCCCGCAGCATTCTGCGGAAGAACGCGAGAAGATGATTGCCGAAAACTTTCGTTCACTCGGCATGGCGCTGGTAGAAACCGGCATGGCATGGTTCTGGCCCGACAGTCGCGTACGTAAATGGTTTGATGTTGAAGGGTTGGATAACCTTAAACGCGCACAAATGCAAAATCGCGGCGTAATGGTTGTCGGCGTCCATTTTATGTCGCTGGAACTG   IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII   AS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:449    YT:Z:UU 
  gi|226767304|gb|GO523315.1|   4   *   0   0   *   *   0   0   ACTGGGGAAACCTTGCAGTTACGGAACTTAAACGCCTGGCAGCACGTGCCCCTTTCAGCACCTGGCGTAATCCGGAAGAGGCCCGCACCAATCGCCCTTCCAACGTGATGCGCAGCCTGAATGGTCAATGGGACT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII YT:Z:UU 
  gi|209377782|gb|GE310270.1|   4   *   0   0   *   *   0   0   AGTTGTAGTTTTTCAACTCATAGATGAGCACTACCCCTTTTGGGGGTTAATCACAAGTTTATCACCGATTGATGGCCCTTAAAGGGGGATTTCTTCTGGAGTTTCCCCTTCACCTGATTTGCAGGAAAGTAAATCACCGCTTTCACAACAGTGACCCACTACTACACACTAAACAACTGGTAAATCTTTTTAAGAGGATTGATCTTAACCAAGCTTAACAATCTTAATTTAATGCTAGGCACCATAGAGTGATGGTCTAGTTATATCATTTAAACCTGAATTAACTTTAACAAATTGAAAGCCTGGCTCCTCATGAGACTAGTTCTTTGTGCTAACCATATCTACTATTTCACATAGTAGAATACCTGAGTTTGCTACTAGGAATGTTCCTGGCTCAATTTCAAGTTTTAAATTTCTTTGATTTTACTGGTTGAATTCATTGATCTTATTTACTGT    IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII    YT:Z:UU
gene-prediction de-novo next-gen alignment gene • 2.3k views
ADD COMMENT
2
Entering edit mode

You could use a tool like Prokka to do gene identification/annotation easily. NCBI also makes their prokaryotic annotation pipeline available, assuming you will be making this data public at some point.

ADD REPLY
0
Entering edit mode

Thank you for your reply. Unfortunately this tool isn't suitable for me for several reasons: a) too complicated for what I want, b) doesn't have proper documentation, c) written in bioperl which means it will be difficult for me to integrate it in my python/bash based pipeline. I am more interested in a manual approach (simple steps) to gene identification. However if you could suggest a python-based equivalent tool or a simpler tool I could take an inside look at it. Thanks again :) I have updated my question to clarify this.

ADD REPLY
1
Entering edit mode

prodigal does an excellent job in predicting genes. I would suggest running cmscan (RFAM) on top of that. That's basically what prokka does (+ some other things). prodigal is standalone. You could use RNAseq data to enhance the computational predictions, you can map the reads->bed file->merge bed using bedtools.

ADD REPLY
0
Entering edit mode

Thank you Asaf, prodigal seems a very good choice! I am most probably going to use that. Do you have in mind anything equivalent to propose for eykariotic genomes?

ADD REPLY
1
Entering edit mode

Unfortunately no. I'm not aware of a good public tool for eukaryotes.

ADD REPLY
1
Entering edit mode

Indeed things in eukaryotes are more complicated, thus there isn't a simple supervised learning tool like prodigal. Moreover as a future reference, for people coming up to this thread, I was wrong before about lack of documentation in PROKKA, since I just found an external source http://metagenomics-workshop.readthedocs.io/en/latest/annotation/index.html which is a tutorial (steps) for annotation.

ADD REPLY
0
Entering edit mode

A final question that I have as regards the use of rna-seq data to enhance the predictions: As I understand you mean to map the reads to the scaffolds and then combine somehow the two separate results (1: genes from prodigal, 2: alignments), using bedtools? I am asking because the only way that I know to combine evidence-based data with ab-initio results is to feed the ab-initio predictor with the evidence-based data at runtime (something that prodigal doesn't seem to be capable of). Thanks again!

ADD REPLY
1
Entering edit mode

Prodigal will give you the coding regions while the RNA-seq results will give you the transcripts, including ncRNAs. If you'll combine them you'll get the genes with the UTRs. I'm not aware of a tool that does this combination neatly, bedtools might be a good tool to get the ORFs from prodigal aligned with the experimental transcripts.

ADD REPLY

Login before adding your answer.

Traffic: 2078 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6