Hi All,
I am new to the RNAseq analysis and I am at the moment trying to quantify TPM using Salmon. I read the manual and decided to go for the quantification alignment based mode. As far as I understood, with this method, I have to align my transcripts to the transcriptome. For the transcript I converted the .gff file in fastaq using gffread. Then I obtained the transcriptome using STAR and the ---quantMode TranscriptomeSAM
and --outSAMtype BAM SortedByCoordinate
commands. However, when I do use the Salmon command present on the manual I get these errors:
[2024-03-27 09:57:03.269] [jointLog] [warning] Transcript "*****" appears in the reference but did not appear in the BAM
[2024-03-27 09:57:03.298] [jointLog] [critical] Transcript "****" appeared in the BAM header, but was not in the provided FASTA file
This is the code I used for quantification:
cd /media/scratchpad_01/guest21
/media/bulk_01/users/guest21/miniconda3/bin/salmon quant -t /media/scratchpad_01/guest21/path/to//file_gtf.fa \
-l A \
-a /media/scratchpad_01/guest21/path/to/Aligned.toTranscriptome.out.bam \
-o /media/scratchpad_01/guest21/Output_RNAseq_30mpi
I have to say that I am working on the cluster of my university and dowloaded the transcript/genome files from NCBI.
Did you check the fasta headers in your fasta file and the reference names that are appearing in the BAM? Looks like they are not matching.
Indeed, the FASTA file and the BAM file have two different heades. In the FASTA some characters are added. Is there a way I could make this two files similar?
I always wondered why people do alignment-based mode. Just use salmon directly on your fastq files and quantify against the transcriptome (see manual of salmon) -- no advantage to me with this alignment-based mode. Just more steps to perform.
To be honest, I find it difficult to understant on how to do it. I have to create a decoy file following the generateDecoyTranscriptome.sh but that seems a difficult task to me.
While using a decoy is recommended it is not necessary, especially if you are running into problems. Just use the mapping based mode as described here: https://salmon.readthedocs.io/en/latest/salmon.html#quantifying-in-mapping-based-mode
Thanks, in the while I could make a decoy using the following code:
However, when I run this code:
I get the following error:
I gzip my transcript.fa and them used the gzip command. Might this affect the file?
I can get the quant.sf file in the end, but, is it accurate if Salmon does not detect the decoy?
Did you check to make sure the file is there? Also it is possible that by simply cutting the names after first space you may have lost other parts of the names.
Again decoys are recommended but not essential for
salmon
. Quant file you got should be usable.