Hello,
I need to pseudo-align my paired reads to the transcriptome using Kallisto. I know that Kallisto does not use a reference genome sequence, and instead it performs pseudo-alignment to determine the compatibility of reads with targets (e.g. transcript sequences).
However, to determine the compatibility of reads with target transcript sequences (to build a Kallisto transcriptome index), how can I choose my targeted reference transcriptome which is human and also Cassava Brown Streak Virus?
I mean, for running the below codes to create the Kallisto index from the transcriptome, should I specify which transcriptome I want to use (e.g. for human or for Cassava Brown Streak Virus)? If so, how to know what is the appropriate transcriptome that I should use for my targeted genomes?
cd
kallisto index -i Potra01-mRNA.idx \
~/share/Day01/data/reference/fasta/Potra01-mRNA.fa.gz
Thank you so much for your advise and guide. Best wishes
The actual fasta files can be downloaded from public data bases such as Ensembl, as described here and here. You want to look for the cDNA bit in the file name since you want to limit yourself to those parts of the genome that refer to the transcribed loci.
Thanks a lot Friederike for your guide. After downloading transcriptome fasta files, then, the name of fasta file would be for fa.gz file? what about name.idx?
Many thanks.
I believe Lior was just trying to indicate that you can put whatever name you want the resulting index to have following
--i
.I.e., if you want two indeces, one for the human, one for the virus cDNA libraries, you will run the command twice:
Many thanks Friederike. I could find fasta files for human and also plants, and I did indexing for them. However, I could not find transcriptome fasta file for Cassava Brown Streak Virus or its close species (TAN70 virus). I would highly appreciate if you can help me from where I can get it.
Many thanks.
Sorry, I've never had to download a viral cDNA index, so I'd have to resort to the usual tools (google etc.) just like you.
OK. Thank you very much Friederike.
Thank you very much Lior. In fact, I want to build an index from both the human and the Cassava Brown Steak Virus separately. I have two different RNA-seq datasets (one for human and another one for Cassava Brown Steak Virus). I need to know how can I obtain transcriptomes for human and the Cassava Brown Steak Virus separately?
Then, I want to know what should I exactly write for name.idx and both fa.gz files for human and the Cassava Brown Steak Virus separately?
Many thanks for the help.
I tried this first...but it resulted in an index around 2.3Gb that failed in the kallisto quant step (and had weird errors like bazillions of equivalence classes, ran out of memory, etc)
But then I unzipped and concatenated them and tried again...and got an index of 4.4Gb, and it worked with the kallisto quant step
Also, don't forget about the 'kallisto inspect' feature..this was helpful to run the 'kallisto inspect' on the new index, without having to run a kallisto quant run