I'm new to using kallisto, and I have a newbie question. If I want to get "gene counts" from the (EDIT: meant to type "abundance file" not pseudobam file) a pseudobam file, is it as simple as mapping the gene ID to transcipt ID using the gtf file based off the transcript reference? What's the difference between doing this and using the --genomebam
and --gtf
options in kallisto quant
to project the transcript alignments to genome coordinates? I did the latter and the only additional file I got is a pseudoalignments.bam.bai file; the abundance file looks the same.
I thought that it was not super straightforward to get gene counts for a transcript quantification tool like kallisto vs. a traditional aligner like STAR or bowtie2, but I know my knowledge is outdated.
Ah ok. Thanks! Also, sorry I mistyped - I meant "If I want to get gene counts from the abundance file..."
So I can do that the first way? ("mapping the gene ID to transcipt ID using the gtf file based off the transcript reference") using the abundance file?
Basically, you just summarize the TPM abundances of all transcripts associated with a particular gene to get gene-level abundances.
For what it's worth, I recommend building kallisto indices using the kb-python package:
pip install kb-python
and usingkb ref
(which will output the kallisto index, the transcriptome fasta, and the gene-to-transcript mapping) on the genomic FASTA and GTF.