Question

Problem in Salmon when indexing Rattus_norvegicus

0

Entering edit mode

10 months ago

dzisis1986 ▴ 70

I am trying to create the index for Rattus_norvegicus.mRatBN7 with Salmon . i am following the steps described in https://combine-lab.github.io/alevin-tutorial/2019/selective-alignment/. I downloaded the Rattus genome and transcripts for cdna and ncRNA (Rattus_norvegicus.mRatBN7_cdna_ncrna.fa.gz Rattus_norvegicus.mRatBN7.2.dna.toplevel.fa.gz). I also tried the example described in the manual with the same ERROR.

Everythin starts correct when I am doing :

salmon index -t gentrome.fa.gz -d decoys.txt -p 12 -i salmon_index --gencode

But I get the following error :

enter image description here

Do you know what it means the positionsKilled error and how to solve it ? In previous post, I found that killed is related to memory. I run on a small server with 12 cores and 32G memory.

salmon • 958 views

ADD COMMENT • link updated 10 months ago by dsull ★ 6.9k • written 10 months ago by dzisis1986 ▴ 70

score 1 · Answer 1 · 2024-01-16

1

Entering edit mode

10 months ago

GenoMax 147k

Actually positionsKilled should be on two lines.

So killed in this case is still likely referring to memory. Please do not use toplevel genome file. Use primary instead. toplevel contains haplotypes etc (see Why is human genome FASTA file on GENCODE much smaller than that on ENSEMBL? ) and you do not need/want those for using as a decoy.

ADD COMMENT • link 10 months ago by GenoMax 147k

0

Entering edit mode

Yes but in ENSEMBL ftp (https://ftp.ensembl.org/pub/release-111/fasta/rattus_norvegicus/dna/) there is only toplevel and primary only for each chromosome not for all ! in GENECODE there is only the genome and transcripts for mouse M34 not for Rattus_norvegicus. it was hard to find also the transcripts so I needed to create them from cdna and ncrna fasta concatenation. If you can give me the links for proper primary genome and transcript flles would be great so I will try with those.

ADD REPLY • link 10 months ago by dzisis1986 ▴ 70

0

Entering edit mode

Do you propose to concatenate for all chromosomes the Rattus_norvegicus.mRatBN7.2.dna.primary_assembly1 ,2,3......20.fa.gz to one file and try again ?

ADD REPLY • link 10 months ago by dzisis1986 ▴ 70

0

Entering edit mode

Sure. Try catting the primary files into one.

ADD REPLY • link 10 months ago by GenoMax 147k

0

Entering edit mode

What about the transcripts file ? Is it ok to concatenate last versions of cdna and ncrna fasta? And what about memory issue with that you think will be fine in the particular machine?

ADD REPLY • link 10 months ago by dzisis1986 ▴ 70

0

Entering edit mode

Yes, you would cat the cdna and ncrna together. I never got why Ensembl even separates them. GENCODE for example just has a single transcripts file that contains everything. 32GB should actually be enough for such a genome. Try monitoring RAM and see if that is really the reason.

ADD REPLY • link 10 months ago by ATpoint 85k

1

Entering edit mode

i completely agree about Ensembl but GENCODE has only human and mouse. I tried in the same machine (32G RAM , 8 Cores) the salmon example and didn't work whereas I tried it in my Mac M2 with 16G RAM and it worked after some hours. Very strange!

ADD REPLY • link 10 months ago by dzisis1986 ▴ 70

0

Entering edit mode

FWIW, as an aside, I find it better to just take the GTF and genome FASTA file and directly extract the "ncrna" and the so-called (unfortunately termed) "cdna" rather than download a transcriptome directly.

(In fact, scRNA-seq tools that map to the transcriptome do that automatically for you).

ADD REPLY • link 10 months ago by dsull ★ 6.9k