Building index with kallisto, keeps getting killed.
2
0
Entering edit mode
23 months ago
pubsurfted ▴ 40

Hello,

I have been trying to create a kallisto index using the following command:

kallisto index -i Glycine-Max.idx Glycine_max.Glycine_max_v2.1.cdna.all.fa.gz 

It does run but soon encounters a problem:

[build] loading fasta file Glycine_max.Glycine_max_v2.1.cdna.all.fa.gz
[build] k-mer length: 31
[build] warning: clipped off poly-A tail (longer than 10)
        from 4 target sequences
[build] warning: replaced 64340 non-ACGUT characters in the input sequence
        with pseudorandom nucleotides
[build] counting k-mers ... Killed

What is the cause behind this error? How to fix it?

Edit: I'm using how-are-we-stranded-here that depends on kallisto index. I know there are alternative software to build index, but I'm limited to kallisto.

Thank you for any replies and best wishes.

kallisto • 2.8k views
ADD COMMENT
1
Entering edit mode

Killed most of the time means you are running out of memory. What is your setup and can you get a computer with more memory?

ADD REPLY
0
Entering edit mode

I'm a bs student so I currently cannot afford to add more memory to my peasant computer.

ADD REPLY
3
Entering edit mode
23 months ago
Michael 55k

Based on dsull's proposal I have made a notebook that creates the index.

https://colab.research.google.com/drive/1Rjl4ncMjqtp9EMUkxc0D7pHvijg94Yjb?usp=sharing

Because the index has already been generated, you can download it directly (Note: you need a google account). However, this will take a while and you might want to consider doing your analysis in colab as a whole. I have added a gzip step, because download from colab could be slow, so you need to gunzip the idx file if you download it like this.

ADD COMMENT
2
Entering edit mode
23 months ago
dsull ★ 6.9k

What version of kallisto? You can check via kallisto --version . Make sure you're using kallisto 0.48.0 (the latest version).

Running the following works just fine on my laptop (a MacBook Pro with 16 gb ram):

wget ftp.ensemblgenomes.org/pub/plants/current/fasta/glycine_max/cdna/Glycine_max.Glycine_max_v2.1.cdna.all.fa.gz
kallisto index -i Glycine-Max.idx Glycine_max.Glycine_max_v2.1.cdna.all.fa.gz

And outputs the following:

[build] loading fasta file Glycine_max.Glycine_max_v2.1.cdna.all.fa.gz
[build] k-mer length: 31
[build] warning: clipped off poly-A tail (longer than 10)
        from 4 target sequences
[build] warning: replaced 64340 non-ACGUT characters in the input sequence
        with pseudorandom nucleotides
[build] counting k-mers ... done.
[build] building target de Bruijn graph ...  done 
[build] creating equivalence classes ...  done
[build] target de Bruijn graph has 1310952 contigs and contains 95464598 k-mers 

It takes 5 mins 23 seconds, and peak memory usage is 5.03 gigabytes.

ADD COMMENT
0
Entering edit mode

Hello, Thank you for taking the time to reply to my post.

My goal is to run how-are-we-stranded-here tool, and one of the dependencies of this tool is kallisto version=0.44.0. So I have to use it but it keeps getting stuck at the kallisto indexing step. :(

ADD REPLY
2
Entering edit mode

That tool should be compatible with indices generated from kallisto 0.48.0 (the latest version). The format of the index has not changed between 0.44.0 and 0.48.0 (although the latest version probably includes some optimizations and bug fixes during the index generation step).

Nonetheless, if you have a computer with less than 6 gb of memory available, why not just run things on google colab? Google colab, which is free, comes with more than enough memory to generate kallisto indices. You can install kallisto on google colab, generate the index using the command supplied above, and then download the index file from google colab.

https://colab.research.google.com/

ADD REPLY
1
Entering edit mode

Hi dsull, running this in colab is a great idea, but it may require some explanation for someone new to colab.

ADD REPLY

Login before adding your answer.

Traffic: 2045 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6