Hi everyone !
I have a small nucleotid sequence (24nt) of which I know the location in the human genome and I have a rna-seq transcriptome. I already know that my sequence is expressed in this transcriptome because I found 5 hits for it with a grep.
However, I used both Salmon and Kallisto to quantify its expression but always 0 TPM. So I suppose that I made a mistake in the command lines. Here is what I did with Kallisto :
First, I downloaded the human reference cDNA (with Ensembl) and I manually added my sequence at the beginning of the reference cDNA file like this :
>ENST...
XXXXXXXXXXXXXXXXXXXXX
I made the index with this reference file and then I launched the quantification with :
kallisto quant -i reference.idx -o Kallisto_files --single -l X -s Y my_transcriptome.fastq
I think I tried everything with the parameters -l and -s.
I replaced "X" by the average lenght of the sample (-l 100) : 0 TPM. The length of the sequence I am searching (-l 24) : 0 TPM. 225 which seems to be the standard value when we don't know but... nothing. 0 TPM. The same with the -s parameter.
And this is the same with Salmon. How is it possible ? I know there are 5 hits and this nucleotid sequence is present in a unique location in the human genome. And I looked at the results and 94% of the genes have 0 TPM but I think this is not a problem, is it ?
Thank you for reading me and sorry for my english
Was that as a part of a large sequence or was that a read that was just 24 bp? At this length you are in the miRNA territory and unless you are specifically using a kit that detects miRNA it would be difficult to say anything with confidence.
The read was just 24 bp and it is a coding RNA (8 aa) so it isn't a miRNA. I don't have more informations about this sequence for the moment but I found it in the genome and 5 times in this transcriptome.
As mentioned in your other thread, most RNAseq protocols typically remove small fragments as a part of the library process. If you are convinced that the 5 reads are real then it is possible that the sample originally may have had a lot more of them. You will want to confirm that on experimental side by modifying the library protocol so the small library fragments are not removed.
miRNA kits directly attach an adapter to the RNA and then specifically select for small size fragments. That is why it may be appropriate to use that method as well.