Hello Biostars,
I used the STAR aligner to align a bunch of single-cell fastq files and then subsequently ran RSEM for quantification. Now I have 96 sets of RSEM outputs, and I want to combine all these (preferably TPM columns) into a single matrix.
I searched online, but could not find an easy way of doing it. The trinity denovo transcriptome building tool may have a perl script that is applicable, but then I didn't understand how to use it for this purpose. RSEM's "rsem-generate-data-matrix" seems simple to use, but then the filenames have to be manually given as inputs, which would be very cumbersome in my situation and the command wouldn't probably accept 96 files.
So, anybody here knows a better way to do this? Is there a ready-made tool for this? I would like to pick the TPMs, if possible.
Any help would be appreciated!
Thanks a lot.
PR
I don't know RSEM output format. If the "rsem-generate-data-matrix" does the job , then you need not worry about passing 96 files on command line if you are working on Linux, as the upper limit for maximum number of arguments is much higher. You can use wildcards to pass the files as arguments using some pattern in the filename. What does all your filenames look like?
Sorry, posted my reply as an answer by mistake. Apologies.
If the order is same for all 96 files you can loop over this command:
paste <(print $2 $FILE_1) <(print $2 $FILE_2)
$2 is column number with TPM values (you have to change accordingly) and $FILE is name of the file. The above example is for two files but you can loop over.Thanks for the reply. I actually ended up writing an R script to loop through, check for order, and make the matrix.