Question

Issues with newer Kallisto versions (aka 0.46.1)

0

Entering edit mode

2.9 years ago

Antonio R. Franco ★ 5.2k

We are running a RNA-Seq analysis using Kallisto (0.46.1) and DESEq2 To our surprise, no DE genes have been obtained, and we don't have a clue about it

We asked for a stranded library to our sequencing company. But they have not provided us with the method they used to get it. Our mapping has been done using the --rf-stranded option as we believe they could use the dUTP method

Since we have not that warranty (and they don't answer our e-mails), we tried to use a GitHub pipeline designed to figure out what kind of stranded library you have.

GitHub to figure out type of strandness

To our surprise, in that GitHub page, they have included this sentence

Sometimes pseudoalignments will not work with newer versions of kallisto. If this is an issue, we suggest downgrading to 0.44.0

Has somebody any further information about this?. The json files we got claim that around 80% of the reads mapped to our cDNA reference

Kallisto • 2.7k views

ADD COMMENT • link updated 2.9 years ago by dsull ★ 7.5k • written 2.9 years ago by Antonio R. Franco ★ 5.2k

3

Entering edit mode

Use salmon with option -l A to infer the library type. Problem solved. Edit: Sorry, I said before "alevin" instead of "salmon", my head was caught in single-cell :-D

ADD REPLY • link 2.9 years ago by ATpoint 88k

3

Entering edit mode

Thank you for your input.

In any case I wanted to know the kind of problem affecting Kallisto.

I am starting to think that salmon is a more robust solution than Kallisto, though

ADD REPLY • link 2.9 years ago by Antonio R. Franco ★ 5.2k

score 2 · Accepted Answer · 2022-06-21

2

Entering edit mode

2.9 years ago

dsull ★ 7.5k

Hello!

The --rf-stranded option is indeed used for strand-specific alignments and it seems you are using it correctly given that 80% of your reads mapped. That's good news! If you use the other strand-specific option: --fr-stranded, you should see significantly fewer reads mapped (do this to make sure; it's a quick & easy check).

As for the comment: "Sometimes pseudoalignments will not work with newer versions of kallisto.", you will not have this problem with 0.46.1 of kallisto. kallisto 0.46.2 had a bug which seg faults when --genomebam (required by that linked pipeline) is used. The latest version of kallisto (0.48.0) and 0.46.1 do not have this issue.

As for no DE genes being obtained, that's an entirely different issue altogether. Do a search of this forum and you'll find many discussions that show you how to make plots and perform analyses that can diagnose the issue of no DE genes.

ADD COMMENT • link 2.9 years ago by dsull ★ 7.5k

0

Entering edit mode

Thank you for your input

This is not the only potential problem I found.

In the Kallisto pages, you can download a zipped file containing the whole mouse cDNA fasta file that contains 118.489 sequences, whereas the transcript_to_genes.txt file contains 141.862 transcripts to gene rows. The transcriptome.idx (the index) is Ok and truly corresponds to the fasta file

ADD REPLY • link 2.9 years ago by Antonio R. Franco ★ 5.2k

2

Entering edit mode

That's not a kallisto issue; it's an Ensembl GTF / FASTA mismatch issue as described here: https://fromsystosys.netlify.app/2020/01/31/comparing-ensembl-gtf-and-cdna/

The cDNA FASTA ends up not containing certain transcripts. This is perfectly fine.

If you want, you can make your own reference transcriptome easily using the kb ref command from the kb-python package (simply feed it the genome FASTA and GTF that you want to create a transcriptome for). This is what I always do because this avoids all the GTF/FASTA mismatch confusion AND you can always use the latest genome+annotation. Building an index is fast.

ADD REPLY • link 2.9 years ago by dsull ★ 7.5k

0

Entering edit mode

I reconstructed the trancript-gene names file by extracting with linux commans that information from each of the fasta headers.

I don't know if you are involved in the making of this zipped index file. If so, I can send you if you want. Contact me at arfranco uco.es

ADD REPLY • link 2.9 years ago by Antonio R. Franco ★ 5.2k

1

Entering edit mode

I know what the files look like -- it's an Ensembl issue where transcripts are not found in the Ensembl cDNA FASTA file but they exist in the GTF annotation file -- those transcripts will not be indexed but will appear in the .txt file. If you're bothered by it, you can use kb-python to make your own index and cDNA FASTA rather than relying on the cDNA FASTA from Ensembl. In fact, you should do this anyway because those zipped files are from several years ago (try to use the latest annotation whenever possible). Here's an example of how kb-python works:

pip install kb-python
wget ftp://ftp.ensembl.org/pub/release-101/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz
wget ftp://ftp.ensembl.org/pub/release-101/gtf/homo_sapiens/Homo_sapiens.GRCh38.101.gtf.gz
kb ref -i human_index.idx -g human_t2g.txt -f1 transcriptome_human.fa Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz Homo_sapiens.GRCh38.101.gtf.gz

Then run kallisto on the generated human_index.idx file.

ADD REPLY • link 2.9 years ago by dsull ★ 7.5k

0

Entering edit mode

Re: Your "robust solution" comment above.

Both programs produce nearly-equivalent quantifications have pretty much the same runtimes for standard RNA-seq. Both programs are actively maintained and under active development.

If there's anything not "robust" about the latest stable version (0.48.0) of kallisto, please post the issues. But kallisto is a robust solution.

ADD REPLY • link 2.9 years ago by dsull ★ 7.5k

0

Entering edit mode

Independent of the Salmon vs. Kallisto debate, a stable release seg faulting every time a normal operation is performed means it wasn't covered by a build test. Or was this limited to specific circumstances?

ADD REPLY • link 2.9 years ago by rpolicastro 13k

0

Entering edit mode

As explained above, this only happens with the --genomebam option (a non-conventional, non-recommended auxiliary use case for kallisto and one that is not part of the standard pseudoalignment workflow). Thus, the statement that it is "seg faulting every time a normal operation is performed " is false. For all standard workflows, all the stable versions of kallisto produce results as expected. The standard workflows (quant and bustools) are tested thoroughly for every major stable release. Again, if you encounter any issues, please post them.

Hope this clarifies.

ADD REPLY • link 2.9 years ago by dsull ★ 7.5k