Missing assembled transcripts after Stringtie merge
1
0
Entering edit mode
4.4 years ago

Hello, I'm using Stringtie v2.0.4 for my RNAseq data analysis. I have a reference genome and annotation file but I still want to identify novel genes since the annotation is not complete for my species. I used:

for f in *.bam; 
do 
    echo ${f}; 
    stringtie -p 8 -f 0.3 -j 5 -G gene_models_main.gff3 -o ${f%.*}.gtf ${f}; 
done

to assemble for individual samples. Then used:

stringtie --merge -p 8 -G gene_models_main.gff3 -o stringtie_merged.gtf ./mergelist.txt

to get the nonredundant transcript gtf file.

I have a gene of interest and want to check the assembly by looking at that gene. I did see many reads mapped to that gene region and it's assembled in some samples (not all of them). However, the gene was missing after I merged all the individual samples. (see figure). Picture1

The merged assembly is on the top, and the rest are the individual assemblies. My target trasncript (start with STRG) is assembled in many samples but not in the merged file. Even more strange is that the close reference gene (arahyL5ZR7F) has no coverage and is not assembled in most samples, but it is there. I tried merging without any filtering criteria and without annotation file. But I cannot see that transcript in the merged file.

I find this issue at downstream analysis, and really don't want to redo the analysis using other assmblers. Can anyone help?

RNA-Seq Assembly Stringtie • 3.1k views
ADD COMMENT
0
Entering edit mode

I'm not sure. Somehow the --merge option does not consider this a propper transcript (which I might also doubt as it is >1000 nt without any introns and does not appear to be identical in any samples?). Did you try running stringtie --merge with the "-i" option?

ADD REPLY
0
Entering edit mode

Hi Kristoffer,

The -i option didn't improve the result. I also notice that some samples assembled 2 separate fragments of this single exon gene. Maybe this inconsistency among samples "confuses" the stringtie?

Besides, seems thhe Stringtie strictly follows the provided reference. By using a updated reference gff file where this gene is correctly annotated, the assembly of this gene looks good. I'll try to use the Stringtie2 to redo the assembly.

ADD REPLY
0
Entering edit mode

Hi Luo, I met the same situation here. How did it work out for you?

ADD REPLY
0
Entering edit mode

Seems that Stringtie use reference annotation as the gold standard. I used an improved annotation file where my interested genes are correctly annotated, and the merging step worked with that file. You can try the newest Stringtie2 to test if it works or use other de novo assembly approach (such as Trinity) if you intend to find novel genes.

ADD REPLY
0
Entering edit mode

Thanks for your help, Luo. I tried to use an improved annotation file, but it didn't work. The new annotation I inserted into GTF (which I ensured the forma same as others and right location), could not be transffered into my output.gtf eventhough I found the right transcript only with "STRG.XXXXX". I am not sure which part went wrong. If you have any idea, I'd appreciate that.

ADD REPLY
0
Entering edit mode

I'm not sure what's the issue here, you can contact the developer for this problem. Or just use de novo assembly.

ADD REPLY
0
Entering edit mode

Thank you for your help:)

ADD REPLY
0
Entering edit mode
2.2 years ago

Have you tried TACO for merging? In my experience TACO gives more sensitive transcript recovery.

ADD COMMENT
0
Entering edit mode

I indeed have tried the TACO, but there is no improvement.

ADD REPLY

Login before adding your answer.

Traffic: 1417 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6