Predict gene in transcriptome
2
0
Entering edit mode
10.2 years ago
CikLa ▴ 90

Hi,

I am dealing with a plant genome, whereby someone else have annotated the genome. They have predict the genes in the genome, as well as some functional annotation on them.

Now, new data coming in. I have transcriptome data, and I want to use the transcriptome data to find new genes that might not be predicted in previous annotation.

Anyone have idea on this? What approach and tools to be used? Can I used Tophat/Bowtie to do this?

Thanks.

transcriptome predict gene • 4.2k views
ADD COMMENT
0
Entering edit mode
10.2 years ago
smithtomsean ▴ 220

Hi CikAlal,

If I understand correctly, you want to use the transcriptome data (RNA-seq?) to improve the previous genome annotation?

If so, yes you can use the Tuxedo suite to do this. You can map the RNA-Seq reads to the genome with Tophat using the previous annotations (GTF file) and assemble transcripts using Cufflinks http://cufflinks.cbcb.umd.edu/manual.html. This will undoubtably throw up a whole load of spruious transcipts however so you'll want to apply some sensible filters (transcript is observed in multiple samples, min expression threshold, multi-exonic etc etc).

However, There are a number of tools available for transcript assembly so Cufflinks may not be the more appropriate for you. This paper provides a comparison of the most widely used (Augustus, Cufflinks, Exonerate, GSTRUCT, iReckon, mGene, mTim, NextGeneid, SLIDE, Transomics, Trembly and Tromer, Oases and Velvet) http://www.nature.com/nmeth/journal/v10/n12/full/nmeth.2714.html.

Hopefully that'll get you started

ADD COMMENT
0
Entering edit mode
10.2 years ago
CikLa ▴ 90

Hi smithtomsean,

Thank you for your respond. Yes, I have RNA-seq data to improve previous genome annotation. I have made some surveys and searching, and found 2 possible solutions that might help me in this case, but need some comment/idea from anyone.

First: http://cufflinks.cbcb.umd.edu/tutorial.html

If you want to discover new genes in a genome that has been annotated, you can use cuffcompare to sort out what is new in your assembly from what is already known. Run cuffcompare like this:

cuffcompare -s /seqdata/fastafiles/hg19/hg19.fa -r known_annotation.gtf merged_asm/merged.gtf

Cuffcompare will produce a number of output files that you can parse to select novel genes and isoforms.

Second: http://bioinf.uni-greifswald.de/bioinf/wiki/pmwiki.php?n=IncorporatingRNAseq.Tophat

Incorporating Illumina RNAseq into AUGUSTUS with Tophat

For this one, is it correct if I say that the approach is much better as compared to use the number one above, as they include AUGUSTUS in the prediction.


I think both approaches make sense in this case, am I right?

Regards

ADD COMMENT
0
Entering edit mode

Hi CikAlal,

I've not got any experience with using RNA-Seq data for this purpose myself so I can't offer much advice. From the second paper I referred to above, I would expect incorporating augustus would improve the sensitivity of your annotations (see Fig 5.). This will depend upon the organism you're working with though as augustus has not been trained on all species and is eukaryote-specific as far as I can tell (see http://bioinf.uni-greifswald.de/augustus/).

Tom

P.s if you're commenting on an answer there's a click box under the answer. Otherwise your response appears as an answer to the original question. I'm not sure if there's a page someone explaining how to use the site (if there is, I should read it myself!)

ADD REPLY
0
Entering edit mode

Hi Tom,

Thank you, I'll try with the second one, (Incorporating Illumina RNAseq into AUGUSTUS with Tophat) to see how it goes. Sorry, which Fig 5 are you referring to? My plant genome is not trained yet, so meaning that I need to train it first using my RNA-seq data, right?

p/s: thanks for that. before this I just a reader, just joined the Biostar as member :)

Regards,
CikAlal.

ADD REPLY

Login before adding your answer.

Traffic: 2364 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6