R vs Python
I just would like to understand why R analysis prevails in RNA-Seq analysis, since I'm studying Python and would like to do all of my projects with the same language.
R vs Python
I just would like to understand why R analysis prevails in RNA-Seq analysis, since I'm studying Python and would like to do all of my projects with the same language.
My guess... Around 1995-2000 biology became (more) digital, quantitative and larger in data throughput. Around that time the technologies associated with it became more affordable, most notably microarrays. Back then, R (first released in 1993) must have appeared as the most sensible choice for data analysis since it was open source, script-based, and designed to work with tabular data specifically to perform statistical analyses. What would have been other options? Maybe SAS or SPSS but they were/are closed source and not as flexible as R. Perl or python back then were not suitable for data analysis (pandas first release is only 2008, matplotlib 2003). From there on R and biology went hand-in-hand.
I'm studying Python and would like to do all of my projects with the same language
If I really had to pick one language for bioinformatics projects, I would go for R. However, I think it is a good investment to use both R and python as appropriate even if you find yourself juggling multiple languages. In the hand, you are pretty much bound to throw in some bash commands anyway.
In addition to the reasons mentioned above, i think it is more a cultural choice than a pratice one. In scientific community (at least in Brazil, where i speak from) we are very encouraged to use R to make our plots and statistic tests, so probably we take that culture to RNA-seq analysis.
General reason
Different from DNA-based analysis, bulk RNA-seq analysis is around the gene expression matrix. They used to handle this kind of data in MS Excel, SPSS, SAS.
With the contribution of R community, R became much more prosperous and popular than them. It's easy to use, fast and scalable. Ppl usually write the computing functions using C++ and package them as R pkgs.
The following scRNA-seq and omics data analysis are mainly developed on this basis.
For Python, omics data analysis is just a glimpse of its power.
Specific event
The publications of DESeq2
, limma
, edgeR
, ggplot2
, Seurat
and some other well-known R pkgs strongly influenced the community.
The question then is why DESeq2/limma/edgeR etc were R packages, rather than python packages.
The answer is that they were created by with a statistical background working in biology, for the use of people with a biology background rather than being created by people with a computer science/software background.
Stats people like R because it was written by stats people to do stats in. It has first class support for table data and statistical models. It has fast, built-in code for fitting statistical models. It thinks the way a statistician thinks.
It also has a long history of use by people who class themselves as being "non-coders". Long before there was literate coding and notebooks, typing at the REPL and evaluating the output (including plots) before deciding what to do next was a common way of working in R. This appeals to people wanting to think of it as a piece of software for analysing their data, rather than a language they have to "program" in. Finally R has a long history of all packages having very detailed step-by-step tutorials in the form of "vignettes" that have first class language support.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
I just would like to understand why R analysis prevails in RNA-Seq analysis, since I'm studying Python and would like to do all of my projects with the same language.
I'm afraid you can ask the same question for any language.
90% of the posts in Biostars about RNA-Seq analysis talk about EdgeR, GSEA, and other R packages.. is it worth to learn Biopython, if I'm more familiar to Python than R?
Python in bioinfo analysis is "fairly" new, while R has been around since forever. That's why the majority of core rna-Seq analyses are done in R. If you want to use python, a lot of packages have already been ported (e.g. PyDESeq2), or alternatives packages are available. If you prefer python and the libraries/methods you need are available in python, then don't worry about R.
there is pyDEseq2 if you want to stay in python https://pydeseq2.readthedocs.io/en/latest/ people commonly use DESeq2 in R