Question

Low pseudo-alignment rate with Kallisto

1

Entering edit mode

6.1 years ago

msubramanian1 ▴ 10

Hi,

I am new to bioinformatics and am trying to perform differential expression analyses on some mouse RNA-seq data. We performed Tru-Seq Strand Specific Large Insert RNA Sequencing - High Coverage (50M pairs) on the sample. I am now trying to pseudo-align the reads to the mouse transcriptome using Kallisto. I ran bam2fq to obtain two fasts files, and also generated a mouse reference transcriptome index from both Ensemble( Mus_musculus.GRCm38.cdna.all.fa) and UCSC Genome Browser (refMrna.fa.gz).

I ran kallisto using the following command: kallisto quant -i index -o output pairA1.fastq pairA2.fastq For all the samples, the resulting run_info.json output looks similar to the example below:

"ntargets": 42184, "nbootstraps": 0, "nprocessed": 73044298, "npseudoaligned": 33281349, "nunique": 19777682, "ppseudoaligned": 45.6, "punique": 27.1, "kallistoversion": "0.45.0", "index_version": 10,

I would really appreciate any help in troubleshooting this issue. Is it an issue with the data quality, or should I be running Kallisto with additional arguments (strand specific, etc.)

Thank you very much for your help and please let me know if I can provide any additional information.

kallisto • 3.2k views

ADD COMMENT • link 6.1 years ago by msubramanian1 ▴ 10

0

Entering edit mode

First I would check for rRNA contamination, there are several threads here discussing methods to do so (e.g. How to screen for rRNA and gDNA contamination in RNA-seq data? ). RSeQC can also give some useful diagnostics, but you will have to map to the genome to use it.

ADD REPLY • link 6.1 years ago by h.mon 35k

0

Entering edit mode

What are the other possibilities of getting low pseudo alignment rate if there are no/minimal contamination and the strandedness option has been correctly used?