Salmon index problem
1
0
Entering edit mode
12 months ago
enee ▴ 20

Hello,

I'm trying to use Salmon in the mapping-based mode, and I downloaded the full decoy salmon indices via refgenie list here using the refgenie command refgenie pull hg38/salmon_sa_index and it download the full folder locally.

Now I have this index folder and SRR21898893_1.fastq.gz and SRR21898893_2.fastq.gz in the same folder and when I run my command: docker run -v "$(pwd):/data" --rm combinelab/salmon salmon quant -i data/2230c535660fb4774114bfa966a62f823fdb6d21acf138d4/salmon_sa_index/default -l A -1 SRR21898893_1.fastq.gz -2 SRR21898893_2.fastq.gz --validateMappings -o transcripts_quant

I have this output:

output

It would appear that the file versionInfo.json does not exist however in the index folder it is present:

{ "indexVersion": 5, "hasAuxIndex": false, "auxKmerLength": 31, "indexType": 2, "salmonVersion": "1.2.1" }

What could be the problem?

Salmon docker RNAseq index refgenie • 1.0k views
ADD COMMENT
1
Entering edit mode
12 months ago

Maybe there is something wrong with your index path:

Can you list the file with the following command:

ls -alrth 2230c535660fb4774114bfa966a62f823fdb6d21acf138d4/salmon_sa_index/default/versionInfo.json

You could try one of the following:

salmon salmon quant -i data/2230c535660fb4774114bfa966a62f823fdb6d21acf138d4/salmon_sa_index
salmon salmon quant -i /data/2230c535660fb4774114bfa966a62f823fdb6d21acf138d4/salmon_sa_index/default
salmon salmon quant -i 2230c535660fb4774114bfa966a62f823fdb6d21acf138d4/salmon_sa_index/default
salmon salmon quant -i 2230c535660fb4774114bfa966a62f823fdb6d21acf138d4/salmon_sa_index

Or even better, try building it from the source, downloading the binary, installing it with conda or apt to run without docker:

https://salmon.readthedocs.io/en/latest/building.html#installation

https://github.com/COMBINE-lab/salmon/releases/download/v1.10.0/salmon-1.10.0_linux_x86_64.tar.gz

conda config --add channels conda-forge
conda config --add channels bioconda
conda create -n salmon salmon

or even:

sudo apt install salmon
ADD COMMENT
1
Entering edit mode

Thank you, it was a problem in my index path (the correct one was data/data/2230c535660fb4774114bfa966a62f823fdb6d21acf138d4/salmon_sa_index/default), anyway now I have the Exception : [std::bad_alloc] probably due to the fact that the index used is 18GB while my macbook's memory is 16GB.

There is a second index that has this description "Transcriptome index for salmon, produced with salmon index using partial selective alignment method. Preparation includes transcriptome mapping to the genome and extraction of the relevant portion out from the genome and indexing it along with the transcriptome. Recipe source -- https://github.com/COMBINE-lab/SalmonTools/blob/master/scripts/generateDecoyTranscriptome.sh", while the description of the one previously used is "ranscriptome index for salmon, produced with salmon index using selective alignment method. Improves quantification accuracy compared to the regular index.".

In practice, how do the results differ depending on the two indexes?

ADD REPLY
1
Entering edit mode

You should try to use the selective alignment whenever possible:

Selective alignment, first introduced by the --validateMappings flag in salmon, and now the default mapping strategy (in version 1.0.0 forward), is a major feature enhancement introduced in recent versions of salmon. When salmon is run with selective alignment, it adopts a considerably more sensitive scheme that we have developed for finding the potential mapping loci of a read, and score potential mapping loci using the chaining algorithm introduced in minimap2 5. It scores and validates these mappings using the score-only, SIMD, dynamic programming algorithm of ksw2 6. Finally, we recommend using selective alignment with a decoy-aware transcriptome, to mitigate potential spurious mapping of reads that actually arise from some unannotated genomic locus that is sequence-similar to an annotated transcriptome. The selective-alignment algorithm, the use of a decoy-aware transcriptome, and the influence of running salmon with different mapping and alignment strategies is covered in detail in the paper Alignment and mapping methodology influence transcript abundance estimation.

The use of selective alignment implies the use of range factorization, as mapping scores become very meaningful with this option. Selective alignment can improve the accuracy, sometimes considerably, over the faster, but less-precise mapping algorithm that was previously used. Also, there are a number of options and flags that allow the user to control details about how the scoring is carried out, including setting match, mismatch, and gap scores, and choosing the minimum score below which an alignment will be considered invalid, and therefore not used for the purposes of quantification.

https://salmon.readthedocs.io/en/latest/salmon.html

ADD REPLY

Login before adding your answer.

Traffic: 1640 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6