Hello, I followed the pipeline for StringTie and prepDe.py as given exactly from the ballgown directory created as given in http://ccb.jhu.edu/software/stringtie/index.shtml?t=manual#de
However, majority of the transcript IDs given in the StringTie merged file are not present in the gene count tables, They are mostly StringTie IDs (MSTRG). Will it be just okay to take those from the merged file and replace the Stringtie ids in the count matrix ?
This is my merged file:
chr01 StringTie transcript 2983 10815 1000 + 0 gene_id "MSTRG.1" transcript_id "Os01t0100100-01" ref_gene_id "Os01g0100100"
chr01 StringTie exon 2983 3268 1000 + 0 gene_id "MSTRG.1" transcript_id "Os01t0100100-01" exon_number "1"
chr01 StringTie exon 3354 3616 1000 + 0 gene_id "MSTRG.1" transcript_id "Os01t0100100-01" exon_number "2"
chr01 StringTie exon 4357 4455 1000 + 0 gene_id "MSTRG.1" transcript_id "Os01t0100100-01" exon_number "3"
This is my gene count matrix for all the samples:
MSTRG.1 41 86 143 167 304 343 46 51 170 320 44 69 167 102 129 311 310 114 97 301 305 25 62
MSTRG.10 9 6 4 3 6 31 2 4 3 6 3 2 36 2 2 17 11 2 1 5 6 2 6
MSTRG.100 8 13 10 14 14 18 5 4 8 11 0 0 0 0 2 0 6 0 0 4 2 0 0
I had the same problem, however it is solved for version 1.3.3, merge is not necessary anymore.
Thanks for the input, I am planning to re-run using 1.3.3 version
Hi, unfortunately, my issue is not resolved with version 1.3.3. I ran the program, once to create the gtf files, once to merge and once for the ballgown outputs. I then ran PrepDE.py on the ballgown folder to get the gene count matrix. I still get StringTie IDs mostly. Is there a way I can map back the IDs ? I will be grateful if I can email you a part of my data and you could have a look ?
Hello saeed brother, how you did the next step after the 6| Estimate transcript abundances and create table counts for Ballgown, and switched to DEseq. kindly guide me. i am very new to this work. thanks
This is probably more appropriate as a new question.