Differential analysis isoform level or gene level ?
0
0
Entering edit mode
4.0 years ago
agtbeeman • 0

Hello all,

I am new to bionformatics and working on a project and my mission is : getting the first reference transcriptom of a specie and perform differential analysis on 2 temperature conditions at isoform level on deseq2. And I have a few questions about methodology.

So far I have a reference transcritome ( I did filter my Trinity fasta according to quality redundancy and also according to transcript expression).

I am concerned it seems not recommanded to perform diffential analysis at isoform level (https://support.bioconductor.org/p/43395/#43400)

So I am wondering wether I should change tools to perform isoform level analysis, or if it is better to do a differential analysis at gene level. Also I wonder if Ihave to cluster my transcripts (using tools like corset), prior to count, since kallisto only gives count at transcript level, unless deseq2 can use the transcript id to cluster them into genes ?

And also now that I am thinking about doing an analysis at gene level I am concerned wheter my filtering according to transcript expression will skew my analysis.

Thank you for reading !

RNA-Seq alignment sequence gene • 1.9k views
ADD COMMENT
2
Entering edit mode

Please use full words - level, not lvl. Smalll things like these are the difference between being a professional and not being one.

ADD REPLY
2
Entering edit mode

I have no idea what you're doing for your reference transcriptome (language is very unclear).

But to do gene-level analysis with DESeq2, you have to summarize the transcript-level estimates to gene-level (see: tximport).

If you want to do transcript-level differential expression analysis, I'd recommend using sleuth (note: sleuth can also do gene-level analysis).

ADD REPLY
0
Entering edit mode

Ok thanks, sorry for being so unclear, I have just edited my post to make it better.

I have decided to do gene-level analysis on deseq2. So far I have followed the documentation. My transcripts id looks like this :TRINITY_DN0_c0_g1_i2. I am not sure it is the right thing but I create my tx2gene table like this

    Transcript_id                Gene_id
  1 TRINITY_DN80838_c0_g1_i1 TRINITY_DN80838_c0_g1_
  2 TRINITY_DN80873_c0_g1_i1 TRINITY_DN80873_c0_g1_
  3 TRINITY_DN80855_c0_g1_i2 TRINITY_DN80855_c0_g1_

And when I look at my final count matrix, it contains for each gene the sum of all isoforms estimated counts, is it normal?

ADD REPLY
0
Entering edit mode

yes, if you use tximport, it actually sum all isoform counts from their gene as gene-level count.

ADD REPLY
0
Entering edit mode

Ok thanks ! I just found it a bit surprising, I would have expected it took into account some other data such as isoform length for instance

ADD REPLY
0
Entering edit mode

I think some other methods like genome aligned based can get the accurate expression count of gene-level. such as subread+featureCount?

ADD REPLY
0
Entering edit mode

It's actually more accurate to get gene-level expression from transcript-level estimates.

Many papers have been written on this e.g.: Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences (this is the tximport paper)

ADD REPLY

Login before adding your answer.

Traffic: 2035 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6