I am working on the GSE 102741 dataset, I have both Raw Gene Count data and log2RPKM dataset , I want to assess the quality of the dataset using PCA analysis by following Paper, How can I calculate the TIN score ? Are there some online tools? can somebody guide me how can we assess the quality of the dataset or give some better suggestions or guidelines as to how to perform the quality analysis on the dataset about Raw Gene Count or log2RPKM counts?
But the
tin.py
is quite slow in computingTIN
as it processes transcripts sequentially. I have a large BAM file of ~35 GB. It took 18 hours to process that. Is there a way to speed it up using multithreading or multiprocessing?