Differential Expression in DEseq2 to GSEA (9 samples, 3 conditions)
1
0
Entering edit mode
6.7 years ago

Hello,

I am working with RNA-seq data and trying to implement my stringtie output file from "prepDE.py" for all 9 of my samples into DESeq2 to perform differential Expression on my three conditions here is how my data is set up:

cell line 1:
sample1 (control)
sample2 (knockdown)
sample3 (overexpression)

cell line 2: 
sample4 (control)
sample5 (knockdown)
sample6 (overexpression)

cell line 3:
sample7 (control)
sample8 (knockdown)
sample9 (overexpression)

I have a generated "transcript_count_matrix.csv" file from prepDE.py and a merged_transcripts.gtf file from stringtie --merge for all 9 samples with FPKM values/ensembl IDs.

I also have the output for each sample from stringtie -e -B:

sample1.gtf  e2t.ctab  e_data.ctab  i2t.ctab  i_data.ctab  t_data.ctab

I would like to know how can I perform Differential expression with this output from stringtie with DESeq2? I would like to compare all 3 control vs. all 3 knockdown/overexpression expression levels and have this in a format that I can use to input as a .gct file for Gene Set Enrichment Analysis.

Much like how cuffdiff works and outputs fpkm_tracking files with gene symbols and fpkm values. I would like something similar with this pipeline.

Any suggestions on how to proceed and any help would be greatly appreciated!!

Thanks so much,

Bryce

RNA-Seq stringtie deseq2 differential expression • 2.5k views
ADD COMMENT
3
Entering edit mode
6.7 years ago

If you want to use DESeq2 for differential expression analysis, then you should start from the raw counts stage, not FPKM values. For double confirmation on this, take the words of Gordon: Question: Differential expression of RNA-seq data using limma and voom()

In your situation, I can understand why you were using StringTie. I would do the following:

  1. Start with your merged_transcripts.gtf and raw FASTQ files (or aligned BAMs)
  2. Determine raw read count abundances over your GTF with Kallisto or Salmon (from FASTQs), featureCounts or BEDTools (BAMs), or something else
  3. Input the raw counts into DESeq2 and conduct differential expression analysis

Kevin

ADD COMMENT

Login before adding your answer.

Traffic: 1110 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6