Dear Biostar Community,
I want to analyze gene isoform (or transcript, or Tx) expression so I performed the IsoformSwitchAnalyzeR pipeline.
I care about novel Tx so I planned to perform Tx assembly and cross sample meta-assembly.
To this end, I googled and found StringTie the optimal software with good reputation.
When I used the software, the -G option, which the developer highly recommended stringtie manual, cofused me a lot.
What confused me is that: -G could be claimed both in single sample assembly (step1) and meta-assembly (step2), which results in 4 different practices:
a) -G both in step1&2
b) -G in neither step1 or step2
c,d) -G only in step1 or 2. I didn't figure out which one is correct.
I found another biostar post cared about the same problem, but the answer didn't solve my concern.
I test both -G and without -G in assembling a single sample and compare the result with the bigwig coverage file, I found without -G gave more resonable results. I will show some examples.
Example 1 (ACTB)
Example 2 (GAPDH)
Example 3 (CDH1)
I found in Malachi Griffith's tutorial that
in single sample assembly (step1) "To use de novo mode do NOT specify either of the -G OR -e options."
in meta-assembly (step2) "-G tells stringtie where to find reference gene annotations. It will use these annotations to gracefully merge novel isoforms (for de novo runs) and known isoforms and maximize overall assembly quality."
Malachi Griffith is an expert in bioinfomatics but he is not the author of StringTie. I wonder if it is the right explanation of stringtie -G logic when performing meta-assembly with de novo transctipt identification.