How to improve existing transcriptome annotation using cufflink protocol ?
5
0
Entering edit mode
7.4 years ago
Chirag Parsania ★ 2.0k

I am working on fungi transcriptome. Existing annotation for my species is not reliable. To improve the annotation we generated RNAseq data with different stress conditions to make all transcripts to express in different conditions and using those expressed transcripts in different conditions I am trying to build one comprehensive transcriptome using cufflink protocol.

I ran cufflink and cuffmerge with and without reference in following combinations.

1) cufflink with reference cuffmerge with reference 2)cufflink with reference cuffmerge without reference 3)cufflink without reference cuffmerge with reference 4)cufflink without reference cuffmerge without reference

All of them has it's pros and cons. By doing some manual observation on IGV i decided to go with 4th combination which is cufflink without reference cuffmerge without reference. Still I am not satisfied the way it did annotations

For example consider following scenario (attached image). Though I have clearly 3 transcripts expressed in all my conditions (shown in the green box) none of the above combinations of cufflink and cuffmerge could detect the true transcript structure.

Can anyone tell why this situation ? Also suggest if any better option available than cufflink cuffmerge.

Assembly transcriptome cufflink cuffmerge • 3.3k views
ADD COMMENT
2
Entering edit mode
7.4 years ago

Dear Chirag,

If there is no specific reason to use cufflink, I strongly recommend using string-tie. Check out the below link for a comparison of string-tie v/s cufflink. Though this comparison is for human RNA-seq data sets, I am sure string-tie will outperform for fungi data.

Here is the link: https://ccb.jhu.edu/software/stringtie/index.shtml?t=example

Look under the heading "Comparisons to Cufflinks".

Please do share your findings. Would love to hear back!

Regards Vijay

ADD COMMENT
0
Entering edit mode

Hi Vijay,

Good to know this new transcriptome assembler (though published in 2014). Seems actively updated vs cufflink (last release 2014). I will try this and share the results.

Thanks a lot.

ADD REPLY
0
Entering edit mode

Chirag, you must also be interested in checking out the "new" tuxedo protocol. Old tuxedo protocol uses tophat, cufflink, cuffmerge and cummerRbund, while the new protocol is described in below image New tuxedo protocol

Here is the paper link.

Hisat2 is very efficient mapper in terms of memory and time. Sensitivity is also good.

ADD REPLY
0
Entering edit mode

Additionally, both the pipelines are coming from same team Steven Salzberg's lab.

ADD REPLY
0
Entering edit mode

Hi Vijay,

I have used stringtie. Results are more or less same with cufllink. I decided not to merge transcripts from different samples rather use individual assembly and filter them manually by user specified criteria. Merging assembly in fugal genome may not good idea because genes are very close to each other. One question I have here is why most of the transcripts orientation are wrong given by string tie? To check the strand I colored alignment by "first of paired strand" on IGV. Most of them having orientation regardless of alignment color. I wonder, how stringtie assigns strand information to generated transcripts ?

ADD REPLY
0
Entering edit mode

Hi Chirag,

There was indeed a bug reported in earlier version (5/18/2015 - v1.0.4 release), however, you should be using the latest version,hence, that could not be an issue now.

Is your library strand specific? According to string-tie paper (page#4):

"We considered multi-exon transcripts to be correctly assembled only if their strand was also correctly identified, and when strand-specific RNA- seq data were used, we also required that single-exon transcripts were assigned to the correct strand"

Did you use --fr and --rf options?

ADD REPLY
0
Entering edit mode

yes... I know these options. My data is strand specific (--rf) so I used string tie accordingly. I also tried another option (--fr) to confirm if my library option is correct. Still there is an issue.

EDIT : More precisely most of them are in correct orientation. But still there are cases which have opposite strand than what we see the reads color in IGV. Most of them I saw having lot's of antisense transcription going on. This could be one reason assembler is not able to assign correct strand

ADD REPLY
0
Entering edit mode
7.4 years ago

You might be able to improve annotation using dedicated software that can use transcriptome support as one of the evidence, plus add information about evolutionary related species genomes, genes, pram domains and so on. You can look at a variety of such tools likŠ° maker, augustus, fgenesh, genemark and many other. You can combine predictions of these tools. Anyway you will need to filter your results extensively based on many different statistics and different data plus for genes of your main interest you better go dipper and assess results visually yourself as you did. I am not sure if you can rebuild reliably all gene models with their alternative splicing from RNA-seq data. Maybe it is possible at a high coverage but I do not remember numbers.

ADD COMMENT
0
Entering edit mode
7.4 years ago

There are tools out there even better than stringtie. Take a look at figure 2 in this article: http://www.ncbi.nlm.nih.gov/pubmed/27760567.

Those might be more interesting.

ADD COMMENT
0
Entering edit mode
6.7 years ago
h.botond ▴ 50

Dear Chirag,

I am struggling with similar problems. I like to improve my fungi genome annotation, especially to the UTR with my RNAseq data. Can you tell me your experiences with this problem? I have tried the classic and new tuxedo workflows and the trinity as well but neither of them could detect the true transcript structure or only partially.

Thanks for any suggestions.

ADD COMMENT
0
Entering edit mode
6.7 years ago

Fungi have quite tightly packed genomes. You could try this specialist option,snowyowl. https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-15-229

ADD COMMENT

Login before adding your answer.

Traffic: 1703 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6