Question

How To Find Novel Genes From Comparative Transcriptomics?

2

Entering edit mode

10.7 years ago

williamjohn360 ▴ 90

We have sequenced from two tissue of plants which are distantly related but are from same species. I want to compare these two transcriptome of these two plants and to find novel genes by comparing two transcriptomes. How can I do that?. Any suggestions?.

I am thinking on the workflow 1. Denovo assembling of two plant transcriptome separately
2. Do gene expression analysis for these denovo assembled plants separately
3. Find novel overlapping genes of these plants.

Is this workflow is correct, please correct me if I am wrong.

transcript • 4.8k views

ADD COMMENT • link updated 10.6 years ago by Adrian Pelin ★ 2.6k • written 10.7 years ago by williamjohn360 ▴ 90

0

Entering edit mode

Please let me know if this question is not clear.

ADD REPLY • link 10.7 years ago by williamjohn360 ▴ 90

5

Entering edit mode

Here are some issues:

"We have sequenced from two tissue of plants which are distantly related but are from same species." what do you mean by that?
We are not going to steal your ideas ;) please tell us the name of the species, and maybe also the tissues.
Is there a reference genome?
Is there already a gene prediction?
You say you want "novel genes", novel with respect to what: the existing predictions on these plants? or novel to the world of genes?

ADD REPLY • link 10.7 years ago by Michael 55k

score 1 · Answer 1 · 2014-04-08

I assume you don't have reference genome. So you can do comprehensive denovo assembly combining both plant sequence reads. Then map the reads of these two plant reads to the denovo assembly and can do differential gene expression analysis. Any differential gene expression finding tools like DEseq, EdgeR or cufflinks can do this. I recommend cufflinks which can do transcript assembly and can find novel gene and transcripts. Refer cufflinks manual for more information.

score 1 · Answer 2 · 2014-04-09

1

Entering edit mode

10.6 years ago

Adrian Pelin ★ 2.6k

Based on your question, I suggest denovo assembly with oases, then jump to #3 and find unique/overlapping genes using tblastx evalue of e-5. After you found something interesting then you can quantify expression.

ADD COMMENT • link 10.6 years ago by Adrian Pelin ★ 2.6k

0

Entering edit mode

why using tblastx? isnt it extremely slow? he can use blastx instead, no?

ADD REPLY • link 10.6 years ago by User000 ▴ 710

0

Entering edit mode

Think about what you are suggesting. He has 2 databases of assembled transcripts, they are both at the nt level. How can he use blastx? blastx required query to be nt and db to be protein.

ADD REPLY • link 10.6 years ago by Adrian Pelin ★ 2.6k

0

Entering edit mode

I was not suggesting I was just wondering. So you are suggesting to run tblastx of one assembled transcript database against the other transcript database, then tblastx makes sense. I apologise for misunderstanding, obviously blastx is out in this case. I have some doubts whether to find overlaps and then do GE or better vice versa, I suppose the other way round is better, but if you have some facts showing that is it better to do overlaps then GE, it would be interesting to know..

ADD REPLY • link 10.6 years ago by User000 ▴ 710