I'm trying to use Salmon in the mapping-based mode, and I downloaded the full decoy salmon indices via refgenie list here using the refgenie command refgenie pull hg38/salmon_sa_index and it download the full folder locally.
Now I have this index folder and SRR21898893_1.fastq.gz and SRR21898893_2.fastq.gz in the same folder and when I run my command:
docker run -v "$(pwd):/data" --rm combinelab/salmon salmon quant -i data/2230c535660fb4774114bfa966a62f823fdb6d21acf138d4/salmon_sa_index/default -l A -1 SRR21898893_1.fastq.gz -2 SRR21898893_2.fastq.gz --validateMappings -o transcripts_quant
I have this output:
It would appear that the file versionInfo.json does not exist however in the index folder it is present:
Thank you, it was a problem in my index path (the correct one was data/data/2230c535660fb4774114bfa966a62f823fdb6d21acf138d4/salmon_sa_index/default), anyway now I have the Exception : [std::bad_alloc] probably due to the fact that the index used is 18GB while my macbook's memory is 16GB.
There is a second index that has this description "Transcriptome index for salmon, produced with salmon index using partial selective alignment method. Preparation includes transcriptome mapping to the genome and extraction of the relevant portion out from the genome and indexing it along with the transcriptome. Recipe source -- https://github.com/COMBINE-lab/SalmonTools/blob/master/scripts/generateDecoyTranscriptome.sh", while the description of the one previously used is "ranscriptome index for salmon, produced with salmon index using selective alignment method. Improves quantification accuracy compared to the regular index.".
In practice, how do the results differ depending on the two indexes?
You should try to use the selective alignment whenever possible:
Selective alignment, first introduced by the --validateMappings flag in salmon, and now the default mapping strategy (in version 1.0.0 forward), is a major feature enhancement introduced in recent versions of salmon. When salmon is run with selective alignment, it adopts a considerably more sensitive scheme that we have developed for finding the potential mapping loci of a read, and score potential mapping loci using the chaining algorithm introduced in minimap2 5. It scores and validates these mappings using the score-only, SIMD, dynamic programming algorithm of ksw2 6. Finally, we recommend using selective alignment with a decoy-aware transcriptome, to mitigate potential spurious mapping of reads that actually arise from some unannotated genomic locus that is sequence-similar to an annotated transcriptome. The selective-alignment algorithm, the use of a decoy-aware transcriptome, and the influence of running salmon with different mapping and alignment strategies is covered in detail in the paper Alignment and mapping methodology influence transcript abundance estimation.
The use of selective alignment implies the use of range factorization, as mapping scores become very meaningful with this option. Selective alignment can improve the accuracy, sometimes considerably, over the faster, but less-precise mapping algorithm that was previously used. Also, there are a number of options and flags that allow the user to control details about how the scoring is carried out, including setting match, mismatch, and gap scores, and choosing the minimum score below which an alignment will be considered invalid, and therefore not used for the purposes of quantification.
Thank you, it was a problem in my index path (the correct one was
data/data/2230c535660fb4774114bfa966a62f823fdb6d21acf138d4/salmon_sa_index/default
), anyway now I have theException : [std::bad_alloc]
probably due to the fact that the index used is 18GB while my macbook's memory is 16GB.There is a second index that has this description "Transcriptome index for salmon, produced with salmon index using partial selective alignment method. Preparation includes transcriptome mapping to the genome and extraction of the relevant portion out from the genome and indexing it along with the transcriptome. Recipe source -- https://github.com/COMBINE-lab/SalmonTools/blob/master/scripts/generateDecoyTranscriptome.sh", while the description of the one previously used is "ranscriptome index for salmon, produced with salmon index using selective alignment method. Improves quantification accuracy compared to the regular index.".
In practice, how do the results differ depending on the two indexes?
You should try to use the selective alignment whenever possible:
Selective alignment, first introduced by the --validateMappings flag in salmon, and now the default mapping strategy (in version 1.0.0 forward), is a major feature enhancement introduced in recent versions of salmon. When salmon is run with selective alignment, it adopts a considerably more sensitive scheme that we have developed for finding the potential mapping loci of a read, and score potential mapping loci using the chaining algorithm introduced in minimap2 5. It scores and validates these mappings using the score-only, SIMD, dynamic programming algorithm of ksw2 6. Finally, we recommend using selective alignment with a decoy-aware transcriptome, to mitigate potential spurious mapping of reads that actually arise from some unannotated genomic locus that is sequence-similar to an annotated transcriptome. The selective-alignment algorithm, the use of a decoy-aware transcriptome, and the influence of running salmon with different mapping and alignment strategies is covered in detail in the paper Alignment and mapping methodology influence transcript abundance estimation.
The use of selective alignment implies the use of range factorization, as mapping scores become very meaningful with this option. Selective alignment can improve the accuracy, sometimes considerably, over the faster, but less-precise mapping algorithm that was previously used. Also, there are a number of options and flags that allow the user to control details about how the scoring is carried out, including setting match, mismatch, and gap scores, and choosing the minimum score below which an alignment will be considered invalid, and therefore not used for the purposes of quantification.
https://salmon.readthedocs.io/en/latest/salmon.html