I am trying to create the index for Rattus_norvegicus.mRatBN7 with Salmon . i am following the steps described in https://combine-lab.github.io/alevin-tutorial/2019/selective-alignment/. I downloaded the Rattus genome and transcripts for cdna and ncRNA (Rattus_norvegicus.mRatBN7_cdna_ncrna.fa.gz Rattus_norvegicus.mRatBN7.2.dna.toplevel.fa.gz). I also tried the example described in the manual with the same ERROR.
Everythin starts correct when I am doing :
salmon index -t gentrome.fa.gz -d decoys.txt -p 12 -i salmon_index --gencode
But I get the following error :
Do you know what it means the positionsKilled
error and how to solve it ?
In previous post, I found that killed is related to memory. I run on a small server with 12 cores and 32G memory.
Yes but in ENSEMBL ftp (https://ftp.ensembl.org/pub/release-111/fasta/rattus_norvegicus/dna/) there is only
toplevel
andprimary
only for each chromosome not for all ! in GENECODE there is only the genome and transcripts for mouse M34 not for Rattus_norvegicus. it was hard to find also the transcripts so I needed to create them from cdna and ncrna fasta concatenation. If you can give me the links for properprimary
genome
andtranscript
flles would be great so I will try with those.Do you propose to concatenate for all chromosomes the Rattus_norvegicus.mRatBN7.2.dna.primary_assembly1 ,2,3......20.fa.gz to one file and try again ?
Sure. Try
cat
ting the primary files into one.What about the transcripts file ? Is it ok to concatenate last versions of cdna and ncrna fasta? And what about memory issue with that you think will be fine in the particular machine?
Yes, you would cat the cdna and ncrna together. I never got why Ensembl even separates them. GENCODE for example just has a single transcripts file that contains everything. 32GB should actually be enough for such a genome. Try monitoring RAM and see if that is really the reason.
i completely agree about Ensembl but GENCODE has only human and mouse. I tried in the same machine (32G RAM , 8 Cores) the salmon example and didn't work whereas I tried it in my Mac M2 with 16G RAM and it worked after some hours. Very strange!
FWIW, as an aside, I find it better to just take the GTF and genome FASTA file and directly extract the "ncrna" and the so-called (unfortunately termed) "cdna" rather than download a transcriptome directly.
(In fact, scRNA-seq tools that map to the transcriptome do that automatically for you).