Question

No expression found with Salmon and Kallisto

0

Entering edit mode

17 months ago

firefox91 • 0

Hi everyone !

I have a small nucleotid sequence (24nt) of which I know the location in the human genome and I have a rna-seq transcriptome. I already know that my sequence is expressed in this transcriptome because I found 5 hits for it with a grep.

However, I used both Salmon and Kallisto to quantify its expression but always 0 TPM. So I suppose that I made a mistake in the command lines. Here is what I did with Kallisto :

First, I downloaded the human reference cDNA (with Ensembl) and I manually added my sequence at the beginning of the reference cDNA file like this :

 >ENST...

XXXXXXXXXXXXXXXXXXXXX

I made the index with this reference file and then I launched the quantification with :

kallisto quant -i reference.idx -o Kallisto_files --single -l X -s Y my_transcriptome.fastq

I think I tried everything with the parameters -l and -s.

I replaced "X" by the average lenght of the sample (-l 100) : 0 TPM. The length of the sequence I am searching (-l 24) : 0 TPM. 225 which seems to be the standard value when we don't know but... nothing. 0 TPM. The same with the -s parameter.

And this is the same with Salmon. How is it possible ? I know there are 5 hits and this nucleotid sequence is present in a unique location in the human genome. And I looked at the results and 94% of the genes have 0 TPM but I think this is not a problem, is it ?

Thank you for reading me and sorry for my english

salmon rna-seq transcriptome kallisto • 2.0k views

ADD COMMENT • link 17 months ago by firefox91 • 0

1

Entering edit mode

I already know that my sequence is expressed in this transcriptome because I found 5 hits for it with a grep.

Was that as a part of a large sequence or was that a read that was just 24 bp? At this length you are in the miRNA territory and unless you are specifically using a kit that detects miRNA it would be difficult to say anything with confidence.

ADD REPLY • link 17 months ago by GenoMax 147k

0

Entering edit mode

The read was just 24 bp and it is a coding RNA (8 aa) so it isn't a miRNA. I don't have more informations about this sequence for the moment but I found it in the genome and 5 times in this transcriptome.

ADD REPLY • link 17 months ago by firefox91 • 0

2

Entering edit mode

As mentioned in your other thread, most RNAseq protocols typically remove small fragments as a part of the library process. If you are convinced that the 5 reads are real then it is possible that the sample originally may have had a lot more of them. You will want to confirm that on experimental side by modifying the library protocol so the small library fragments are not removed.

miRNA kits directly attach an adapter to the RNA and then specifically select for small size fragments. That is why it may be appropriate to use that method as well.

ADD REPLY • link 17 months ago by GenoMax 147k

score 2 · Accepted Answer · 2023-06-23

2

Entering edit mode

17 months ago

dsull ★ 6.9k

This is because kallisto and salmon set the default k-mer length to 31 (k=31). You're trying to find a sequence of length less than 31 so you need to decrease the k-mer length (e.g. k=23).

ADD COMMENT • link 17 months ago by dsull ★ 6.9k

0

Entering edit mode

It worked thanks !

ADD REPLY • link 17 months ago by firefox91 • 0

0

Entering edit mode

Sorry to come back to this but why don't I use a k-mer length to 24 instead of 23 ? And for the -l parameter (length of the reads), should I use the length of my transcript or the typical value of 200 ? Thanks

ADD REPLY • link 17 months ago by firefox91 • 0

1

Entering edit mode

Because k must be an odd number. This is because even-length sequences can have reverse complements be the same as the original sequence.

See https://bioinformatics.stackexchange.com/questions/156/why-do-some-assemblers-require-an-odd-length-kmer-for-the-construction-of-de-bru

ADD REPLY • link 17 months ago by dsull ★ 6.9k

0

Entering edit mode

thanks again !

ADD REPLY • link 17 months ago by firefox91 • 0

1

Entering edit mode

As for the -l parameter, it's the fragment length and it's not the length of the read or the length of the transcript. It's how long your fragments are which can only really be determined by looking at a BioA or by inferring from paired-end reads. -l really only affects the effective length and TPMs, and if you're doing differential expression, you're really only interested in the estimated counts anyway.

For things like microRNAs, the -l parameter does not really apply because fragmentation doesn't really occur for such short species. So, I'd just set -l 1.

ADD REPLY • link 17 months ago by dsull ★ 6.9k