Question

StringTie merged transcripts for two different conditions

0

Entering edit mode

3.6 years ago

vitor ▴ 130

Hi guys,

When using StringTie for finding new transcripts in two different conditions, for example treatment_A and treatment_B, do I need to have two different (one for each condition) merged .gtf files? For later running StringTie -eB for samples of treatment_A with the merged .gtf files obtained from this respective condition and the same with treatment_B.

Or, can I have a pool of the .gtf files comprising all merged transcripts from all samples and then run StringTie -eB? Which one is the best for figuring out possible novel transcripts?

Thanks!

gtf coverage stringtie • 1.2k views

ADD COMMENT • link updated 2.2 years ago by Ram 45k • written 3.6 years ago by vitor ▴ 130

score 2 · Accepted Answer · 2021-11-10

If the aim is to do differential expression, then you deifnately need to use a pooled GTF - all RNAseq differential expression analysis statistics rely on the idea that the same thing is being quantified in two conditions. This means that although read coverage is biased (ten copies of gene A will lead to a different number of reads vs ten copies of gene B), the biases are the same for the same gene in different samples, so it is possible to compare read counts for the same gene across samples.

Even if you aim is to find transcripts that are 0 in one condition (i.e. entirely absent) and present in the other, i'd still use a pooled GTF - you want to be sure that the transcript isn't expressed by directly trying to quantify it.