Hello, I'm using Stringtie v2.0.4 for my RNAseq data analysis. I have a reference genome and annotation file but I still want to identify novel genes since the annotation is not complete for my species. I used:
for f in *.bam;
do
echo ${f};
stringtie -p 8 -f 0.3 -j 5 -G gene_models_main.gff3 -o ${f%.*}.gtf ${f};
done
to assemble for individual samples. Then used:
stringtie --merge -p 8 -G gene_models_main.gff3 -o stringtie_merged.gtf ./mergelist.txt
to get the nonredundant transcript gtf file.
I have a gene of interest and want to check the assembly by looking at that gene. I did see many reads mapped to that gene region and it's assembled in some samples (not all of them). However, the gene was missing after I merged all the individual samples. (see figure).
The merged assembly is on the top, and the rest are the individual assemblies. My target trasncript (start with STRG) is assembled in many samples but not in the merged file. Even more strange is that the close reference gene (arahyL5ZR7F) has no coverage and is not assembled in most samples, but it is there. I tried merging without any filtering criteria and without annotation file. But I cannot see that transcript in the merged file.
I find this issue at downstream analysis, and really don't want to redo the analysis using other assmblers. Can anyone help?
I'm not sure. Somehow the --merge option does not consider this a propper transcript (which I might also doubt as it is >1000 nt without any introns and does not appear to be identical in any samples?). Did you try running stringtie --merge with the "-i" option?
Hi Kristoffer,
The -i option didn't improve the result. I also notice that some samples assembled 2 separate fragments of this single exon gene. Maybe this inconsistency among samples "confuses" the stringtie?
Besides, seems thhe Stringtie strictly follows the provided reference. By using a updated reference gff file where this gene is correctly annotated, the assembly of this gene looks good. I'll try to use the Stringtie2 to redo the assembly.
Hi Luo, I met the same situation here. How did it work out for you?
Seems that Stringtie use reference annotation as the gold standard. I used an improved annotation file where my interested genes are correctly annotated, and the merging step worked with that file. You can try the newest Stringtie2 to test if it works or use other de novo assembly approach (such as Trinity) if you intend to find novel genes.
Thanks for your help, Luo. I tried to use an improved annotation file, but it didn't work. The new annotation I inserted into GTF (which I ensured the forma same as others and right location), could not be transffered into my output.gtf eventhough I found the right transcript only with "STRG.XXXXX". I am not sure which part went wrong. If you have any idea, I'd appreciate that.
I'm not sure what's the issue here, you can contact the developer for this problem. Or just use de novo assembly.
Thank you for your help:)