Dear community, I am running rna-seq pipeline from nfcore,
sudo nextflow run nf-core/rnaseq \
--input microsheet.csv \
--outdir rnaseq \
--skip_alignment \
--pseudo_aligner salmon \
--fasta references/ensembl/Homo_sapiens.GRCh38.dna.primary_assembly.fa \
--gtf references/ensembl/Homo_sapiens.GRCh38.111.gtf \
--transcript_fasta references/ensembl/Homo_sapiens.GRCh38.cdna.all.fa \
--max_memory 50GB \
--max_cpus 18 \
-profile docker \
And I got an error:
ERROR ~ Error executing process > 'NFCORE_RNASEQ:RNASEQ:QUANTIFY_PSEUDO_ALIGNMENT:TX2GENE Caused by:
Missing output file(s)*.tsv
expected by processNFCORE_RNASEQ:RNASEQ:QUANTIFY_PSEUDO_ALIGNMENT:TX2GENE (Homo_sapiens.GRCh38.dna.primary_assembly.filtered.gtf)
Command error: __main__ - 2024-03-18 11:10:51,695 WARNING: No attribute in GTF matching transcripts __main__ - 2024-03-18 11:10:51,695 ERROR: Failed to map transcripts to genes.
My reference comes from ensembl, and upon checking the files I discovered that the .gtf file contains transcript_id like this: transcript_id "ENST00000511072" While my counts, spawned from transcriptome reference are named like this: ENST00000390469.2
I can't find gtf file from ensembl that contains the information about version (.1, .2 etc.). Could the version be causing the error? It is suprising that the pipeline doesn't check for this?
Any advise is much appreciated. Thank you
Since you are using
salmon
you should not need the GTF file. Can you try taking that out?It is specified in the nfcore docs that I need it:
I tried running it to confirm and got:
Having said that, I actually obtained the quant.sf files from salmon, it is the TX2GENE step that fails.
Did you ever manage to solve this? Im running into the exact same problem...