Question

Question regarding stringtie algorithm and assembly strategy

0

Entering edit mode

3.8 years ago

Rogerio Ribeiro ▴ 110

Good morning Biostars. I realize this might be a very specific question, and I have also tried to post it on the corresponding GitHub repo, but I'm afraid that might take a while.

For my project I've have been tasked to assemble a transcriptome for my species of interests, using as a guide the genome (not at a chromosome level, 2090 contigs) and using 23 Illumina samples.

After my initial assembly (hisat2 + stringtie2 + stringtie2 --merge pipeline) I've noticed that quite a few transcripts In my assembly covered two or more reference genes. After manual inspection, I've noticed that a great deal of these cases actually has only a few reads supporting the splicing sites. This is a known problem with stringtie. To solve this, I decided to increase the stringency (with -c 1.5 and -j 15), which lead to some improvements. My supervisor suggested that I instead concatenate all the alignments from all the 23 samples, and then feed that file to stringtie, increasing the -j parameter. I've since read the original paper on stringtie and got the idea that transcript expression levels are important for the assembly, but I'm not to sure not this.

Is it correct to use this approach, or I should assemble each sample individually?

Assembly RNA-Seq • 851 views

ADD COMMENT • link updated 3.8 years ago by i.sudbery 20k • written 3.8 years ago by Rogerio Ribeiro ▴ 110

score 0 · Answer 1 · 2021-02-01

0

Entering edit mode

3.8 years ago

i.sudbery 20k

One would normally assemble each sample seperately and then use stringtie merge to merge those together, only keeping those that have enough support, but I'm not sure anyone has ever done a proper systematic comparison of the two approaches.

ADD COMMENT • link 3.8 years ago by i.sudbery 20k

0

Entering edit mode

I searched the literature and I have not found a single example of my approach. My logic is that by using all the reads as evidence I have better control of the -j parameter to decrease trans-splicing sites (splicing sites between different reference genes).

ADD REPLY • link 3.8 years ago by Rogerio Ribeiro ▴ 110