Hello, I have started working on RNA-seq data. I have samples at 0th hr (control, 2 replicates ), and 4 other time points, each with two replicates. I have two objectives. 1) Construct a network using WGCNA across all samples including replicates 2) Identify differentially expressed genes for each time point with respect to the 0th hour. I am interested only in gene-level analysis, so I need the normalized read counts for network construction, I am not looking for novel genes/isoforms at this point. I have performed the initial QCs and alignment and have generated the bam files. I have run the first set of commands with STRINGTIE as mentioned in http://www.nature.com/nprot/journal/v11/n9/full/nprot.2016.095.html
stringtie -p 4 -G annotation.gtf -o test_out.gtf - accepted_hits.bam
sample output:
chr01 StringTie transcript 3005 3259 1000 + . gene_id "STRG.1"; transcript_id "STRG.1.1"; cov "2.929412"; FPKM "8.213017"; TPM "11.184443";
chr01 StringTie exon 3005 3259 1000 + . gene_id "STRG.1"; transcript_id "STRG.1.1"; exon_number "1"; cov "2.929412";
chr01 StringTie transcript 16399 20144 1000 + . gene_id "STRG.2"; transcript_id "STRG.2.1"; reference_id "Os01t0100500-01"; ref_gene_id "Os01g0100500"; cov "3.086694"; FPKM "8.653979"; TPM "11.784943";
chr01 StringTie exon 16399 16976 1000 + . gene_id "STRG.2"; transcript_id "STRG.2.1"; exon_number "1"; reference_id "Os01t0100500-01"; ref_gene_id "Os01g0100500"; cov "1.583477";
chr01 StringTie exon 17383 17474 1000 + . gene_id "STRG.2"; transcript_id "STRG.2.1"; exon_number "2"; reference_id "Os01t0100500-01"; ref_gene_id "Os01g0100500"; cov "3.401087";
chr01 StringTie exon 17558 18258 1000 + . gene_id "STRG.2"; transcript_id "STRG.2.1"; exon_number "3"; reference_id "Os01t0100500-01"; ref_gene_id "Os01g0100500"; cov "3.796742";
I also extracted the gene abundances in tab-delimited format using -A option after the previous step.
sample output:
Gene ID Gene Name Reference Strand Start End Coverage FPKM TPM
Os01g0100100 - chr01 + 2983 10815 2.604207 7.30126 9.942816
STRG.1 - chr01 + 3005 3259 2.929412 8.213017 11.184443
Os01g0100200 - chr01 + 11218 12435 0.051337 0.14393 0.196004
Os01g0100300 - chr01 - 11372 12284 0.341975 0.958775 1.305655
Os01g0100400 - chr01 + 12721 15685 0.296166 0.830343 1.130756
Os01g0100466 - chr01 - 12808 13978 0.19112 0.535832 0.729693
Os01g0100500 - chr01 + 16399 20144 4.431273 12.423696 16.918522
The FPKM and TPM values dont seem to be matching between the two files. Also, I am unable to understand what is STRG in the files. Any suggestions regarding understanding the outputs and also regarding the pipeline will be greatly appreciated.
Thanks
Few comments:
- For "STRG" prefix read stringtie manual
- If you want to estimate the abundance of given reference transcripts (use
-e
flag with-G
)Thanks for your input, will look into it.
STRG should be the prefix for any newly assembled transcript apart from the ones in the provided
-G
.Thanks for the input