Question

Suggestions to measure differential expression of two transcripts

1

Entering edit mode

8.5 years ago

colonppg ▴ 120

Dear all:

new in this... just a quick question to ask for suggestions to tackle potential differential expression of two transcripts from the same gene, what I did:

mapped and generated .bam files using tophat2
"grep genename the.human.gtf.file" to get exon info from the original gtf file used in tophat2, this info saved to gene.bed file (keep only those that are exon)
"bedtools multicov -bams bamfiles -bed gene.bed > output.bed" this is looped to get information for all the bam files...

the issues:

the bed file I generated has multiple transcripts ID, they share most exons. my idea right now is to look at exons that are different from each other.
For short exons, the count info is not reliable since many samples has no match
If I want to "normalize" by total match for the sample and exon length, will this be enough?

Thanks and appreciate any suggestions on packages or strategies... this is a time series study it will be nice to show different transcripts change over time....

RNA-Seq sequencing • 1.8k views

ADD COMMENT • link updated 8.5 years ago by dario.garvan ▴ 520 • written 8.5 years ago by colonppg ▴ 120

1

Entering edit mode

So you want to know if the two transcripts of a same gene has different expression levels with in the same sample ?

Why don't you get the transcript level counts from tools like cufflinks/StringTie and then compare ?

ADD REPLY • link 8.5 years ago by GouthamAtla 12k

2

Entering edit mode

Or quicker, with salmon/kallisto.

ADD REPLY • link 8.5 years ago by Devon Ryan 104k

0

Entering edit mode

I disagree that read count comparisons between transcripts with different nucleotide compositions should be made, unless the dataset is from Pacific Biosciences single molecule real-time sequencing or has been statistically adjusted for GC biases.

ADD REPLY • link 8.5 years ago by dario.garvan ▴ 520

0

Entering edit mode

GC bias is minimal these days. Further, things like Salmon can adjust for that if needed.

ADD REPLY • link 8.5 years ago by Devon Ryan 104k

score 0 · Answer 1 · 2016-05-19

That question is not answerable with your dataset. You are comparing read counts from two different regions with different sequence compositions. For example, a read count of 10 from one transcript's unique exon and 10 from the other's does not mean they are expressed at the same amount. They could be expressed at different amounts, but the GC content difference of the regions could be causing the counts to appear as if they are the same.

A better alternative is to compare the same transcript between the timepoints you have RNA-seq samples for. Perhaps, over time, one transcript increases in expression and the other transcript decreases. It is a justifiable result, since you are comparing counts from the same sequence across conditions and the sequence bias is no longer involved in the comparisons made.

You don't need to make your own software to get transcript-wise counts. Use well-trusted tools, such as Salmon or kallisto, which Devon already recommended.