We need to decide if we work with differentially expressed transcripts or differentially expressed genes.
We expected that the quantification (FPKM and / or TPM) of the genes would be the sum of all transcripts of the same gene. But we note that this does not occur. It appears that Ballgown sums FPKM and / or TPM from just a few of the transcripts of this gene. This makes us afraid to use differentially expressed genes in later analyzes. After all, the estimated FPKM and / or TPM per gene seems to be biased because it does not consider all transcripts.
But we stop to think that “within” the same gene, one transcript may be more expressed in one group and another transcript may be more expressed in another group (alternative splicing). Hence we think of working with differentially expressed transcripts.
How to get transcript_id in StringTie output (.gtf file) so that it is compatible with a database (NCBI or ENSEMBL)? In our current output the transcript_id are
"gene1.1"
"gene1.2"
"gene2.1"
"gene3.1"
"gene3.2"
"gene3.3"
..
What annotation file (.GFF) did you have to map the transcripts? You need to use the GFF file used for mapping and the stringTie output which is also in .GFF. Use GFFcompare tool to get the ids for the transcripts.