I am currently trying to use kallisto and IGV to visualize reads from a single-cell RNASeq experiment. I'm interested in non-coding RNA, but I would still like to see how the reads align to each chromosome. Currently, I can visualize how the reads align to each individual non-coding transcript by aligning to the ensembl non-coding RNA fasta. But when I tried implementing --genomebam and the GTF file from ensembl, I stopped seeing my reads when I aligned to the genome fasta. My code below successfully generates my transcript index:
! kallisto index -i transcripts.idx Homo_sapiens.GRCh38.ncrna.fa.gz
But then I try the below code to generate a bam file to use in IGV, such that I see how the reads align to the genome...
! kallisto quant --genomebam --gtf Homo_sapiens.GRCh38.99.gtf.gz -i transcripts.idx -o output 1.fastq 2.fastq
I get a bam file successfully, but I don't see any alignments in IGV.
For what it's worth, I get the below warning:
Warning: 173489 transcripts were defined in GTF file, but not in the index
I've read in the kallisto manual that I may also need a chromosome number/size file, but I don't know how to get such a file.
Thanks so much
You can get the chromosome number/size file from UCSC. GRCH38 example.
Are you mixing and matching reference genome/transcriptome from two places (e.g. Ensembl/UCSC)?
No, my reference genome and transcriptome are both from ensembl's website. I'll try incorporating the chromosome file.
Thanks
UCSC uses
chr
notation in front of chromosome numbers but Ensembl does not. So if you use that file make sure ID's match by removingchr
.did you get this working? did you find any issues with the Warning? I'm trying now and got the same type of warning with the matched gtf and transcripts files from gencode 38. thanks.