I used the nf-core/rnaseq pipeline in the cluster human genome, and I got some files as the output of Star salmon.
My question is which file I use as input to DESeq2? and where is the full file that contains 58,000 genes?
salmon.merged.gene_counts_length.tsv (29744 genes)
salmon.merged.gene_counts_length_scaled.tsv (29744 genes)
salmon.merged.gene_counts_scaled.tsv (29744 genes)
salmon.merged.gene_counts.tsv (29744 genes)
salmon.merged.gene_temp.tsv (29744 genes)
Looks like the salmon reference you used contained 29744 transcripts. There are currently 19396 protein coding genes in human genome as of this month.
Thanks GenoMax. So, what is the correct salmon reference that I should use?
How did you run the pipeline? Which reference did you use?
I ran the pipeline using the nextflow and the reference I used for GRCh38.
From where I can download the latest version of the human reference genome (I need to use wget in linux but I need the link). I need to use it for RNA-seq analysis
There is probably nothing wrong with the reference you used. It may simply not have every transcript in there but it may be sufficient.
You can find transctipt/genome sequence files at:
GENCODE: https://www.gencodegenes.org/human/
Ensembl (only transcripts): http://ftp.ensembl.org/pub/current_fasta/homo_sapiens/cdna/Homo_sapiens.GRCh38.cdna.all.fa.gz