Hi,
I was wondering if anybody who might know more about StringTie could help clear up some ideas about StringTie's --merge
parameters (specifically the -G
<guide_gff> argument).
I've run StringTie individually on a few samples, which to my knowledge generates a genome-guided transcriptome from the sample - containing novel (and also known) transcripts. To get a non-redundant set from the all (replicate) samples I ran StringTie with, I would use StringTie's --merge
option, which seems to work fine so far.
According to StringTie's manual: "If the -G option (reference annotation) is provided, StringTie will assemble the transfrags from the input GTF files with the reference transcripts" - my assumption was that it would somehow assist in the merging process of the individual GTF's into the final non-redundant GTF. What I noticed though was that the resulting GTF file is much, much bigger in file size if that parameter is included. After some searching I saw some discussion that including the -G
parameter actually would include the reference transcripts in the final merged GTF.
To summarise, if I only wanted to obtain novel transcripts for further analysis, would that mean I should not include the -G
option during stringtie --merge
? And would there be a different way for looking at possible sample/condition-specific transcripts of known genes?
Hello DGTool did you ever work this out? I am having the same problem as I only want the reference transcripts to be included if they are actually found in my data.