StringTie merge GTF results in large increase in number of transcripts
1
1
Entering edit mode
4.5 years ago
nattzy94 ▴ 60

Hi,

I have 5 replicates of RNAseq data. I aligned the read files with STAR and then assembled the gtf (using StringTie) for each replicate individually giving me 5 gtf files. I then used Stringtie Merge on the 5 gtf files to get a single merge.gtf.

When I do wc -l on the individual gtf files, I get an average of 700,000+ transcripts. When I do the same for the merge.gtf, I get 1.7 million transcripts. Since these are all replicates, shouldn't I get roughly the same number of transcripts in the merged gtf and the individual gtfs?

Did I do something wrong in the merge step?

RNA-Seq Assembly • 1.9k views
ADD COMMENT
1
Entering edit mode
4.5 years ago

The vast majority of them could be low abundance transcripts, i.e., artifacts of expression of neighbouring high abundance transcripts, otherwise known as 'transcriptional noise'. The possibility exists that they are also false-positive alignments from STAR. You could increase your thresholds when running stringtie merge - take a look at the bottom of this page:

If this does not help, please share your full STAR and StringTie commands.

Kevin

ADD COMMENT

Login before adding your answer.

Traffic: 1818 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6