Help regarding RNA-seq STRINGTIE pipeline and output
0
0
Entering edit mode
7.4 years ago
pixie@bioinfo ★ 1.5k

Hello, I have started working on RNA-seq data. I have samples at 0th hr (control, 2 replicates ), and 4 other time points, each with two replicates. I have two objectives. 1) Construct a network using WGCNA across all samples including replicates 2) Identify differentially expressed genes for each time point with respect to the 0th hour. I am interested only in gene-level analysis, so I need the normalized read counts for network construction, I am not looking for novel genes/isoforms at this point. I have performed the initial QCs and alignment and have generated the bam files. I have run the first set of commands with STRINGTIE as mentioned in http://www.nature.com/nprot/journal/v11/n9/full/nprot.2016.095.html

stringtie -p 4 -G annotation.gtf -o test_out.gtf - accepted_hits.bam

sample output:

chr01 StringTie transcript 3005 3259 1000 + . gene_id "STRG.1"; transcript_id "STRG.1.1"; cov "2.929412"; FPKM "8.213017"; TPM "11.184443"; chr01 StringTie exon 3005 3259 1000 + . gene_id "STRG.1"; transcript_id "STRG.1.1"; exon_number "1"; cov "2.929412"; chr01 StringTie transcript 16399 20144 1000 + . gene_id "STRG.2"; transcript_id "STRG.2.1"; reference_id "Os01t0100500-01"; ref_gene_id "Os01g0100500"; cov "3.086694"; FPKM "8.653979"; TPM "11.784943"; chr01 StringTie exon 16399 16976 1000 + . gene_id "STRG.2"; transcript_id "STRG.2.1"; exon_number "1"; reference_id "Os01t0100500-01"; ref_gene_id "Os01g0100500"; cov "1.583477"; chr01 StringTie exon 17383 17474 1000 + . gene_id "STRG.2"; transcript_id "STRG.2.1"; exon_number "2"; reference_id "Os01t0100500-01"; ref_gene_id "Os01g0100500"; cov "3.401087"; chr01 StringTie exon 17558 18258 1000 + . gene_id "STRG.2"; transcript_id "STRG.2.1"; exon_number "3"; reference_id "Os01t0100500-01"; ref_gene_id "Os01g0100500"; cov "3.796742";

I also extracted the gene abundances in tab-delimited format using -A option after the previous step.

sample output:

Gene ID Gene Name Reference Strand Start End Coverage FPKM TPM

Os01g0100100 - chr01 + 2983 10815 2.604207 7.30126 9.942816

STRG.1 - chr01 + 3005 3259 2.929412 8.213017 11.184443

Os01g0100200 - chr01 + 11218 12435 0.051337 0.14393 0.196004 Os01g0100300 - chr01 - 11372 12284 0.341975 0.958775 1.305655 Os01g0100400 - chr01 + 12721 15685 0.296166 0.830343 1.130756 Os01g0100466 - chr01 - 12808 13978 0.19112 0.535832 0.729693 Os01g0100500 - chr01 + 16399 20144 4.431273 12.423696 16.918522

The FPKM and TPM values dont seem to be matching between the two files. Also, I am unable to understand what is STRG in the files. Any suggestions regarding understanding the outputs and also regarding the pipeline will be greatly appreciated.

Thanks

RNA-Seq Stringtie • 4.5k views
ADD COMMENT
2
Entering edit mode

Few comments:
- For "STRG" prefix read stringtie manual
- If you want to estimate the abundance of given reference transcripts (use -e flag with -G)

ADD REPLY
0
Entering edit mode

Thanks for your input, will look into it.

ADD REPLY
1
Entering edit mode

STRG should be the prefix for any newly assembled transcript apart from the ones in the provided -G.

ADD REPLY
0
Entering edit mode

Thanks for the input

ADD REPLY

Login before adding your answer.

Traffic: 1980 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6