Hello!
My question focuses on the usage of Stringtie
I am a student completing an independent study trying to detect novel lncRNA from RNA-Seq data. (I have 4 samples with triplicates, total of 12 fastq files). I'm using HISAT2 for alignment and Stringtie for assembly/abundance estimation. I've already generate GTF files for each triplicate. Currently I'm at the abundance estimation point of my project.
I'm currently implementing the --merge
usage of Stringtie. It is my understanding that merge mode will generate a non-redundant GTF file from the GTF files generated from each sample, as well as the reference annotation if included. This new merged GTF file is then used as a new annotation when determining DE. From the Stringtie manual I find that I have the following option
-G <guide_gff>
reference annotation to include in the merging (GTF/GFF3)
-m <min_len>
minimum input transcript length to include in the merge (default: 50)
-i
keep merged transcripts with retained introns (default: these are not kept unless there is strong evidence for them)
So, Since I am trying to detect novel lncRNA, and do not really care about the DE of annotated genes, would it be advisable for me to do the following:
- set the minimum transcript length to 200, since that is the minimum length of lncRNA? (-m option)
- to not use an additional reference annotation, because I am only looking for novel lncRNA (-G option)
- keep all retained introns, because this could lead to non-coding/loss of function characteristic of lncRNA (-i option)
Thank you in advance for any suggestions! I have never post on this site before so please tell me if I need more information or follow more guidelines
note: edited for clarity
Hey, this question is relevant for me too! Were you able to come to a conclusion if any of these options are good for such analaysis?