Hi. I am trying to align RNA-seq data via kallisto, but I think the answer to my question can be true for other similar situations.
I have 10 fastq.gz files, and I asked kallisto to process all of them at the same time with *.gz
. So, it returned only 2 files: a .tsv and a .json;
whereas if I tell kallisto 10 times to process each fastq.gz file individually, it would return 10 .tsv and 10 .json. right?
I am wondering if the information I get in these situations are the same or not; and if either way is better for my downstream analysis.
Many thanks for your help.
Yes. I just wanted to know if any strategy is more preferred because this is the first time I am doing it. I'll keep them separate then. Thank you Istvan.
Istvan's answer is correct but I should mention a technical note: There is a way to run all samples together in a single kallisto run while maintaining the sample identity. This involves using the kallisto | bustools workflow (which I imagine will eventually be the standard workflow for running kallisto, even for bulk). This is advantageous in cases where you want to preserve the "raw" kallisto output, which are equivalence classes associated with transcript-compatibility counts (TCCs). Equivalence classes are different between different kallisto runs so if you're interested in TCCs (e.g. https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0970-8 ), then you should run all your files together in a single batch.
(If this is confusing, don't worry about it)
Thanks dsull. Great to know about that. Much appreciated.