I have pseudo-aligned some RNASeq reads using Kallisto where each sample was split over two flow cells.
I pseudo-aligned the fastq files for each flow cell separately, meaning for each sample I have two abundance TSV files that look like this:
target_id length eff_length est_counts tpm
ENST00000434970.2 9 7 0 0
ENST00000448914.1 13 11 0 0
ENST00000415118.1 8 6 1 0
ENST00000631435.1 12 10 0 0
ENST00000632684.1 12 10 0 0
I would like to merge each sample's two TSV files into one and aggregate the transcript level counts to gene-level counts with tximport in R.
I'm unsure about the best way to approach this. The length column is the same in both TSV files, and the est_counts just need to be added together, but I'm not sure about eff_length and tpm. Does tximport require this information?
I would appreciate some advice.
Thank you!
Do you intend on using DESeq2 afterwards? If so, one potential option you can explore is the
collapseReplicates
function of DESeq2. I know this is not the direct answer here since it is after the usage of tximport, but just an FYI for you to explore. Alternatively, why did you not merge these earlier? I normally use Salmon, which lets me add technical replicates at the mapping step, resulting in a single file, and I would be surprised if Kallisto doesn't have a way to do this as well.