Hi,
I am using Stringtie v2.1.1 on a single bam file. I end up with a gff file, but it looks like some transcripts are duplicated, for example:
chr1 StringTie transcript 729898 732218 1000 - . gene_id "STRG.28"; transcript_id "STRG.28.1"; cov "9.134856"; FPKM "1.605420"; TPM "2.896626";
chr1 StringTie exon 729898 729955 1000 - . gene_id "STRG.28"; transcript_id "STRG.28.1"; exon_number "1"; cov "5.454021";
chr1 StringTie exon 732017 732218 1000 - . gene_id "STRG.28"; transcript_id "STRG.28.1"; exon_number "2"; cov "10.191729";
chr1 StringTie transcript 729898 732218 1000 - . gene_id "STRG.28"; transcript_id "STRG.28.2"; cov "3.270196"; FPKM "0.574726"; TPM "1.036966";
chr1 StringTie exon 729898 729955 1000 - . gene_id "STRG.28"; transcript_id "STRG.28.2"; exon_number "1"; cov "1.947918";
chr1 StringTie exon 732013 732218 1000 - . gene_id "STRG.28"; transcript_id "STRG.28.2"; exon_number "2"; cov "3.642488";
In this example, I have 2 transcripts, starting and ending at the same position. They also have the same exons, except that in one case, the second exon start at position 732017 while on the other, it starts at position 732013.
If you consider another case,
chr1 StringTie transcript 13483 29654 1000 - . gene_id "STRG.1"; transcript_id "STRG.1.1"; cov "27.115095"; FPKM "4.765386"; TPM "8.598089";
chr1 StringTie exon 13483 15038 1000 - . gene_id "STRG.1"; transcript_id "STRG.1.1"; exon_number "1"; cov "23.612379";
chr1 StringTie exon 15796 15947 1000 - . gene_id "STRG.1"; transcript_id "STRG.1.1"; exon_number "2"; cov "18.194462";
chr1 StringTie exon 16607 16765 1000 - . gene_id "STRG.1"; transcript_id "STRG.1.1"; exon_number "3"; cov "13.165168";
chr1 StringTie exon 16858 17055 1000 - . gene_id "STRG.1"; transcript_id "STRG.1.1"; exon_number "4"; cov "40.344353";
chr1 StringTie exon 17233 17368 1000 - . gene_id "STRG.1"; transcript_id "STRG.1.1"; exon_number "5"; cov "52.639740";
chr1 StringTie exon 17606 17742 1000 - . gene_id "STRG.1"; transcript_id "STRG.1.1"; exon_number "6"; cov "49.598957";
chr1 StringTie exon 17915 18061 1000 - . gene_id "STRG.1"; transcript_id "STRG.1.1"; exon_number "7"; cov "45.024239";
chr1 StringTie exon 18268 18366 1000 - . gene_id "STRG.1"; transcript_id "STRG.1.1"; exon_number "8"; cov "47.735268";
chr1 StringTie exon 24738 24891 1000 - . gene_id "STRG.1"; transcript_id "STRG.1.1"; exon_number "9"; cov "16.246500";
chr1 StringTie exon 29534 29654 1000 - . gene_id "STRG.1"; transcript_id "STRG.1.1"; exon_number "10"; cov "1.105868";
chr1 StringTie transcript 13483 29654 1000 - . gene_id "STRG.1"; transcript_id "STRG.1.2"; cov "4.613946"; FPKM "0.810885"; TPM "1.463064";
chr1 StringTie exon 13483 15038 1000 - . gene_id "STRG.1"; transcript_id "STRG.1.2"; exon_number "1"; cov "2.968100";
chr1 StringTie exon 15796 15947 1000 - . gene_id "STRG.1"; transcript_id "STRG.1.2"; exon_number "2"; cov "2.287062";
chr1 StringTie exon 16607 16765 1000 - . gene_id "STRG.1"; transcript_id "STRG.1.2"; exon_number "3"; cov "1.654875";
chr1 StringTie exon 16858 17055 1000 - . gene_id "STRG.1"; transcript_id "STRG.1.2"; exon_number "4"; cov "5.071326";
chr1 StringTie exon 17233 17368 1000 - . gene_id "STRG.1"; transcript_id "STRG.1.2"; exon_number "5"; cov "6.616868";
chr1 StringTie exon 17606 17742 1000 - . gene_id "STRG.1"; transcript_id "STRG.1.2"; exon_number "6"; cov "6.234639";
chr1 StringTie exon 17915 18061 1000 - . gene_id "STRG.1"; transcript_id "STRG.1.2"; exon_number "7"; cov "5.659593";
chr1 StringTie exon 18268 18369 1000 - . gene_id "STRG.1"; transcript_id "STRG.1.2"; exon_number "8"; cov "6.363258";
chr1 StringTie exon 18913 24891 1000 - . gene_id "STRG.1"; transcript_id "STRG.1.2"; exon_number "9"; cov "5.117283";
chr1 StringTie exon 29534 29654 1000 - . gene_id "STRG.1"; transcript_id "STRG.1.2"; exon_number "10"; cov "0.139009";
2 transcripts, almost the same, except for exon 9 which starts at position 24738 in one case and 18913 in the other, although they end at the same position.
What should I do in this case, consider them as a single isoform and add the TPM? Keep them as separate (but then what is the reason behind this), or simply remove on of them.
This is on a human sample, assembled using hg38.
Thanks in advance for your help
Why is it different than any other case where there are two isoforms? I understand that the difference between the two is minute in these cases but they are still different.
hi, As a side-note to what has been posted already by Kristoffer, you could load the BAM and stringTie assembled GTF in your local IGV. Once there, check the Sashimi plots. If those alternate exon start/ stop sites are true, then you should see splice-junction support for both the boundaries of the exon.