Question

Help regarding RNA-seq STRINGTIE pipeline and output

0

Entering edit mode

7.4 years ago

pixie@bioinfo ★ 1.5k

Hello, I have started working on RNA-seq data. I have samples at 0th hr (control, 2 replicates ), and 4 other time points, each with two replicates. I have two objectives. 1) Construct a network using WGCNA across all samples including replicates 2) Identify differentially expressed genes for each time point with respect to the 0th hour. I am interested only in gene-level analysis, so I need the normalized read counts for network construction, I am not looking for novel genes/isoforms at this point. I have performed the initial QCs and alignment and have generated the bam files. I have run the first set of commands with STRINGTIE as mentioned in http://www.nature.com/nprot/journal/v11/n9/full/nprot.2016.095.html

stringtie -p 4 -G annotation.gtf -o test_out.gtf - accepted_hits.bam

sample output:

chr01 StringTie transcript 3005 3259 1000 + . gene_id "STRG.1"; transcript_id "STRG.1.1"; cov "2.929412"; FPKM "8.213017"; TPM "11.184443"; chr01 StringTie exon 3005 3259 1000 + . gene_id "STRG.1"; transcript_id "STRG.1.1"; exon_number "1"; cov "2.929412"; chr01 StringTie transcript 16399 20144 1000 + . gene_id "STRG.2"; transcript_id "STRG.2.1"; reference_id "Os01t0100500-01"; ref_gene_id "Os01g0100500"; cov "3.086694"; FPKM "8.653979"; TPM "11.784943"; chr01 StringTie exon 16399 16976 1000 + . gene_id "STRG.2"; transcript_id "STRG.2.1"; exon_number "1"; reference_id "Os01t0100500-01"; ref_gene_id "Os01g0100500"; cov "1.583477"; chr01 StringTie exon 17383 17474 1000 + . gene_id "STRG.2"; transcript_id "STRG.2.1"; exon_number "2"; reference_id "Os01t0100500-01"; ref_gene_id "Os01g0100500"; cov "3.401087"; chr01 StringTie exon 17558 18258 1000 + . gene_id "STRG.2"; transcript_id "STRG.2.1"; exon_number "3"; reference_id "Os01t0100500-01"; ref_gene_id "Os01g0100500"; cov "3.796742";

I also extracted the gene abundances in tab-delimited format using -A option after the previous step.

sample output:

Gene ID Gene Name Reference Strand Start End Coverage FPKM TPM

Os01g0100100 - chr01 + 2983 10815 2.604207 7.30126 9.942816

STRG.1 - chr01 + 3005 3259 2.929412 8.213017 11.184443

Os01g0100200 - chr01 + 11218 12435 0.051337 0.14393 0.196004 Os01g0100300 - chr01 - 11372 12284 0.341975 0.958775 1.305655 Os01g0100400 - chr01 + 12721 15685 0.296166 0.830343 1.130756 Os01g0100466 - chr01 - 12808 13978 0.19112 0.535832 0.729693 Os01g0100500 - chr01 + 16399 20144 4.431273 12.423696 16.918522

The FPKM and TPM values dont seem to be matching between the two files. Also, I am unable to understand what is STRG in the files. Any suggestions regarding understanding the outputs and also regarding the pipeline will be greatly appreciated.

Thanks

RNA-Seq Stringtie • 4.5k views

ADD COMMENT • link 7.4 years ago by pixie@bioinfo ★ 1.5k

2

Entering edit mode

Few comments:
- For "STRG" prefix read stringtie manual
- If you want to estimate the abundance of given reference transcripts (use -e flag with -G)

ADD REPLY • link 7.4 years ago by PoGibas 5.1k

0

Entering edit mode

Thanks for your input, will look into it.

ADD REPLY • link 7.4 years ago by pixie@bioinfo ★ 1.5k

1

Entering edit mode

STRG should be the prefix for any newly assembled transcript apart from the ones in the provided -G.