Identifying Species Specific Differences In De Novo Transcriptome Data
1
0
Entering edit mode
10.8 years ago
pld 5.1k

I will soon have an assembled transcriptome and I have a good idea on what downstream analysis I would like to do. However, I am worried that basing the downstream functional annotation off of BLAST will cause losses in species specific differences. My interest is less in doing a basic assessment of how many genes in my model species land in the different reference organisms and more in highlighting the differences. In other words, chances are BLAST will show that there are more common genes than uncommon but I need to know the differences in the common genes.

First, I was wondering duplicating the analysis with assembled transcripts vs predicted ORFs would be worth it. Or, if I should go a step further and run the analysis using predicted peptide sequences.

Second, can anyone suggest ways to perform a more fine-grained functional/comparative analysis than just applying annotations based on BLAST results?

gene rna-seq transcriptome • 3.0k views
ADD COMMENT
1
Entering edit mode
10.8 years ago

I agree that comparing differences between de-novo assembly results (especially if you want something quantitative, like differential expression) is likely to be tricky. Actually, I think this is true regardless of whether you are working with one species or multiple species, but I agree that a multiple species comparison has additional complications. For example, a same species comparison still would have issues with defining 1:1 relationships between contigs/transcripts and having minor differences in alignment possibly over-estimate differences in BLAST annotations (in the same species, this could be to having a top hit for a homolog in speciesA versus a homolog in speciesB that actually have very similar E-scores but would be ignored if you only look at top hits).

I don't have a great solution for this problem, except recommending that you qualitatively compare the most highly expressed expressed genes (using an arbitrary cutoff like 30 or 40 genes). For example, I found that the CLC Bio contigs for adipose and muscle tissue showed logical differences for tissue-specific expression in this top genes (as well as for unranked positive controls). This actually worked better than both Trinity and Oases (which were specifically for RNA-Seq, unlike CLC Bio).

Beyond this, I can only recommend papers that may be potentially useful (mostly obtained from a Google search):

How to Compare 2 Differential expressed transcripts from 2 different de novo assembly?

http://www.biomedcentral.com/1471-2164/14/805

http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0082674

http://genomebiology.com/2013/14/2/R16

http://www.slideshare.net/AustralianBioinformatics/differential-expression-analysis-of-de-novo-assembled-transcriptomes

ADD COMMENT
0
Entering edit mode

My model species is the only de novo sample, the rest is going to come from ensembl (Cat, Dog, Cow, Horse, Human, Rhesus, Mouse). We have some conditions I can do the DE analysis with, but it is a side note. The goal of using stimuli was to try and activate different transcription profiles providing a broader pool of transcripts. BLAST gets me a rough idea of what this gene might do, but nothing else.

I was thinking more along the lines of sequence analysis or annotation. E.g. geneX from humans has a known phosphorylation site at pos Y, but in the model organism this site seems to be missing. This project is to find leads for molecular work, and I want to go further and provide readily testable targets.

ADD REPLY

Login before adding your answer.

Traffic: 1084 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6