Question

DEseq2 input

1

Entering edit mode

13 months ago

r.shiasi3897 ▴ 10

Hello Guys, @Michael Love I have a transcriptomics dataset and did rnaseq/nf-core pipeline by salmon-star. my output of the salmon-star folder is as follows:

salmon.merged.gene_counts.tsv salmon.merged.gene_counts_length_scaled.tsv salmon.merged.gene_counts_scaled.tsv salmon.merged.gene_lengths.tsv salmon.merged.gene_tpm.tsv salmon.merged.transcript_counts.tsv salmon.merged.transcript_lengths.tsv salmon.merged.transcript_tpm.tsv tx2gene.tsv

my question is: which one of these files should be an input for Deseq2 Bioconductor, if I want to analyze it on my own?

NOTE: for doing DEseq2 by Differencialabundance/nf-core pipeline, we have to use 2 files (--matrix 'salmon.merged.gene_counts.tsv' \ --transcript_length_matrix 'salmon.merged.gene_lengths.tsv').

DEseq2 • 1.1k views

ADD COMMENT • link updated 13 months ago by ATpoint 86k • written 13 months ago by r.shiasi3897 ▴ 10

score 2 · Answer 1 · 2023-11-27

2

Entering edit mode

13 months ago

ATpoint 86k

I would use salmon.merged.gene_counts_length_scaled.tsv. This is what tximport returns with the lengthScaledTPM option. In a nutshell, salmon produces estimates of transcript-level abundances (counts). tximport summarizes them to the gene level. Since different samples can espress different isoforms of a gene which can have different lengths tximport corrects for counts that a larger due to longer isoforms. The lengthScaledTPM option does some magic to modify the counts to correct for all of that so that you can use mentioned file directly for your differential analysis, for example with DESeq2. If you care, this is the code section where the file comes from:

https://github.com/nf-core/rnaseq/blob/master/bin/tximport.r#L110

ADD COMMENT • link 13 months ago by ATpoint 86k

0

Entering edit mode

Thank you for your response, you mean put this tsv file in dds <-DESeqDataSetFromTximport() and follow the structure of Bioconductor or put it directly in dds <- DESeqDataSetFromMatrix?