Hello Guys,
@Michael Love
I have a transcriptomics dataset and did rnaseq/nf-core pipeline by salmon-star. my output of the salmon-star folder is as follows:
my question is: which one of these files should be an input for Deseq2 Bioconductor, if I want to analyze it on my own?
NOTE: for doing DEseq2 by Differencialabundance/nf-core pipeline, we have to use 2 files (--matrix 'salmon.merged.gene_counts.tsv' \
--transcript_length_matrix 'salmon.merged.gene_lengths.tsv').
I would use salmon.merged.gene_counts_length_scaled.tsv. This is what tximport returns with the lengthScaledTPM option. In a nutshell, salmon produces estimates of transcript-level abundances (counts). tximport summarizes them to the gene level. Since different samples can espress different isoforms of a gene which can have different lengths tximport corrects for counts that a larger due to longer isoforms. The lengthScaledTPM option does some magic to modify the counts to correct for all of that so that you can use mentioned file directly for your differential analysis, for example with DESeq2. If you care, this is the code section where the file comes from:
Thank you for your response, you mean put this tsv file in dds <-DESeqDataSetFromTximport() and follow the structure of Bioconductor or put it directly in dds <- DESeqDataSetFromMatrix?
Thank you for your response, you mean put this tsv file in dds <-DESeqDataSetFromTximport() and follow the structure of Bioconductor or put it directly in dds <- DESeqDataSetFromMatrix?
Use DESeqDataSetFromMatrix to construct the dds