Entering edit mode
6.5 years ago
wt215
•
0
Hi,
The number of cells from scRNAseq experiment can be very large. Especially for recent 10X datasets, a dataset contains around 1.3 million cells, which is very large.
R seems to have trouble even in loading the raw gene-cell expression count table. I am not very familiar with Bioinformatics in Python, can python handle such large dataset easily?
Given such large datasets, many normalization methods which utilized Bayesian methods or optimization algorithm could be time consuming. Which language do you think that could win, R or python?
Thanks in advance.
Software is only as good as the underlying algorithm. If that is flawed then software (using that algorithm) running faster with one particular language does not make that language/package a winner.
Good programmers will work around technical difficulties. Parts of a program can be coded in a different language (if that offers technical advantages) and then called from within a program.
Yes I agree. I am a bit worried that the development of hardware cannot keep up with the development of scRNAseq techniques.
The data is getting bigger and bigger, especially for sequencing fastq data and hence the increasing number of cells stored in the count table.
I really hope that there is one day that my laptop can handle both preprocessing fastq files as well as downstream analysis easily.
who told you that was an acceptable platform?
Large datasets are always going to require access to appropriately sized hardware. Ideally you would be able to have access via your company/institute/university but if that is not an option then cloud based providers do have solutions that will fit, even now. They will be pricey to pay out of pocket.
Your laptop (if it retains that form factor in future) may handle much larger data but we are sadly a ways away from that day.
Neither of those are software, but programming languages. Both can be completely shit when you don't use them right, and both can solve your issue with loading raw gene-cell expression data if you use them correctly.
A lot of scRNAseq packages are written in R.
Sorry, my mistake, should be language rather than software. Thanks for pointing it out.
Your bottle-neck is likely not going to be the choice of language. It's going to be the availability of existing packages to do what you want to do. Python will likely be faster for loading large datasets, but if there aren't already packages for scRNA-seq analysis, are you going to spend the time to write your own? I guess it will come down to what is better time spent: writing something new in the faster language or cobbling together existing things in either languages to accomplish your goal.
Languages are tools and if Python and R are my only choices, I pick Rython.