I have merged transcriptome assembly of many accessions using stringtie merge command. Stringtie assigned "MSTRG.1" id to gene AT1G01010 which normally have 1 transcript (http://plants.ensembl.org/Arabidopsis_thaliana/Gene/Summary?db=core;g=AT1G01010;r=1:3600-6000;t=AT1G01010.1), however here I can see 2 more transcriprts, does that means these two trancripts are novel transcripts from known gene?
1 StringTie transcript 3631 5899 1000 + . gene_id "MSTRG.1"; transcript_id "AT1G01010.1"; gene_name "NAC001"; ref_gene_id "AT1G01010";
1 StringTie exon 3631 3913 1000 + . gene_id "MSTRG.1"; transcript_id "AT1G01010.1"; exon_number "1"; gene_name "NAC001"; ref_gene_id "AT1G01010";
1 StringTie exon 3996 4276 1000 + . gene_id "MSTRG.1"; transcript_id "AT1G01010.1"; exon_number "2"; gene_name "NAC001"; ref_gene_id "AT1G01010";
1 StringTie exon 4486 4605 1000 + . gene_id "MSTRG.1"; transcript_id "AT1G01010.1"; exon_number "3"; gene_name "NAC001"; ref_gene_id "AT1G01010";
1 StringTie exon 4706 5095 1000 + . gene_id "MSTRG.1"; transcript_id "AT1G01010.1"; exon_number "4"; gene_name "NAC001"; ref_gene_id "AT1G01010";
1 StringTie exon 5174 5326 1000 + . gene_id "MSTRG.1"; transcript_id "AT1G01010.1"; exon_number "5"; gene_name "NAC001"; ref_gene_id "AT1G01010";
1 StringTie exon 5439 5899 1000 + . gene_id "MSTRG.1"; transcript_id "AT1G01010.1"; exon_number "6"; gene_name "NAC001"; ref_gene_id "AT1G01010";
1 StringTie transcript 3651 5899 1000 + . gene_id "MSTRG.1"; transcript_id "MSTRG.1.2";
1 StringTie exon 3651 3913 1000 + . gene_id "MSTRG.1"; transcript_id "MSTRG.1.2"; exon_number "1";
1 StringTie exon 3996 4276 1000 + . gene_id "MSTRG.1"; transcript_id "MSTRG.1.2"; exon_number "2";
1 StringTie exon 4506 4605 1000 + . gene_id "MSTRG.1"; transcript_id "MSTRG.1.2"; exon_number "3";
1 StringTie exon 4706 5095 1000 + . gene_id "MSTRG.1"; transcript_id "MSTRG.1.2"; exon_number "4";
1 StringTie exon 5174 5326 1000 + . gene_id "MSTRG.1"; transcript_id "MSTRG.1.2"; exon_number "5";
1 StringTie exon 5439 5899 1000 + . gene_id "MSTRG.1"; transcript_id "MSTRG.1.2"; exon_number "6";
1 StringTie transcript 3657 5899 1000 + . gene_id "MSTRG.1"; transcript_id "MSTRG.1.3";
1 StringTie exon 3657 3913 1000 + . gene_id "MSTRG.1"; transcript_id "MSTRG.1.3"; exon_number "1";
1 StringTie exon 3996 4276 1000 + . gene_id "MSTRG.1"; transcript_id "MSTRG.1.3"; exon_number "2";
1 StringTie exon 4486 5095 1000 + . gene_id "MSTRG.1"; transcript_id "MSTRG.1.3"; exon_number "3";
1 StringTie exon 5174 5326 1000 + . gene_id "MSTRG.1"; transcript_id "MSTRG.1.3"; exon_number "4";
1 StringTie exon 5439 5899 1000 + . gene_id "MSTRG.1"; transcript_id "MSTRG.1.3"; exon_number "5";
Many thanks for the explanation, can we also investigate novel transcripts based on their TPM/FPKM value across samples that generated as a result of the second assembly instead of loading bam and gtf file in igv?
yes sure, but remember the expression values are for the whole transcript. Also, stringTie uses a maximum flow algorithm to use all sequencing reads therefore the alternative transcripts are not necessarily real ones and hence further validation is needed. May be you also want to consider exon-centric analysis (read on featureCounts -f exons, and DEXSeq) if your goal is to further investigate alternative transcripts.