Question

DEseq2 input from nf-core/rnaseq

0

Entering edit mode

18 months ago

وفاء • 0

I used the nf-core/rnaseq pipeline in the cluster human genome, and I got some files as the output of Star salmon.

My question is which file I use as input to DESeq2? and where is the full file that contains 58,000 genes?

salmon.merged.gene_counts_length.tsv (29744 genes)
salmon.merged.gene_counts_length_scaled.tsv (29744 genes)
salmon.merged.gene_counts_scaled.tsv (29744 genes)
salmon.merged.gene_counts.tsv (29744 genes)
salmon.merged.gene_temp.tsv (29744 genes)

Deseq2 • 2.4k views

ADD COMMENT • link updated 18 months ago by GenoMax 148k • written 18 months ago by وفاء • 0

1

Entering edit mode

Looks like the salmon reference you used contained 29744 transcripts. There are currently 19396 protein coding genes in human genome as of this month.

ADD REPLY • link 18 months ago by GenoMax 148k

0

Entering edit mode

Thanks GenoMax. So, what is the correct salmon reference that I should use?

ADD REPLY • link 18 months ago by وفاء • 0

0

Entering edit mode

How did you run the pipeline? Which reference did you use?

ADD REPLY • link 18 months ago by GenoMax 148k

0

Entering edit mode

I ran the pipeline using the nextflow and the reference I used for GRCh38.

From where I can download the latest version of the human reference genome (I need to use wget in linux but I need the link). I need to use it for RNA-seq analysis

ADD REPLY • link 18 months ago by وفاء • 0

0

Entering edit mode

There is probably nothing wrong with the reference you used. It may simply not have every transcript in there but it may be sufficient.

You can find transctipt/genome sequence files at:

GENCODE: https://www.gencodegenes.org/human/

Ensembl (only transcripts): http://ftp.ensembl.org/pub/current_fasta/homo_sapiens/cdna/Homo_sapiens.GRCh38.cdna.all.fa.gz

ADD REPLY • link 18 months ago by GenoMax 148k

score 2 · Answer 1 · 2023-07-26

2

Entering edit mode

18 months ago

bioinfo8 ▴ 230

You should use salmon.merged.gene_counts_length_scaled.tsv and here is the reference.

While feeding to DESeqDataSetFromMatrix() use round() for inputing counts.

ADD COMMENT • link 18 months ago by bioinfo8 ▴ 230

0

Entering edit mode

Thank you bioinfo8. Also, I am not sure if I use the correct reference genome. So, from where I can download the latest version of the human reference genome (I need to use wget in linux but I need the link). I need to use it for RNA-seq analysis

ADD REPLY • link 18 months ago by وفاء • 0