why R scripts are preferred to biopython for RNA-Seq analyses?
3
1
Entering edit mode
3 months ago

R vs Python

I just would like to understand why R analysis prevails in RNA-Seq analysis, since I'm studying Python and would like to do all of my projects with the same language.

R Python • 1.1k views
ADD COMMENT
0
Entering edit mode

I just would like to understand why R analysis prevails in RNA-Seq analysis, since I'm studying Python and would like to do all of my projects with the same language.

ADD REPLY
0
Entering edit mode

I'm afraid you can ask the same question for any language.

I just would like to understand why XXXXX analysis prevails in YYYYY, since I'm studying ZZZZZZ and would like to do all of my projects with the same language.

ADD REPLY
0
Entering edit mode

90% of the posts in Biostars about RNA-Seq analysis talk about EdgeR, GSEA, and other R packages.. is it worth to learn Biopython, if I'm more familiar to Python than R?

ADD REPLY
1
Entering edit mode

Python in bioinfo analysis is "fairly" new, while R has been around since forever. That's why the majority of core rna-Seq analyses are done in R. If you want to use python, a lot of packages have already been ported (e.g. PyDESeq2), or alternatives packages are available. If you prefer python and the libraries/methods you need are available in python, then don't worry about R.

ADD REPLY
0
Entering edit mode

there is pyDEseq2 if you want to stay in python https://pydeseq2.readthedocs.io/en/latest/ people commonly use DESeq2 in R

ADD REPLY
1
Entering edit mode
3 months ago

My guess... Around 1995-2000 biology became (more) digital, quantitative and larger in data throughput. Around that time the technologies associated with it became more affordable, most notably microarrays. Back then, R (first released in 1993) must have appeared as the most sensible choice for data analysis since it was open source, script-based, and designed to work with tabular data specifically to perform statistical analyses. What would have been other options? Maybe SAS or SPSS but they were/are closed source and not as flexible as R. Perl or python back then were not suitable for data analysis (pandas first release is only 2008, matplotlib 2003). From there on R and biology went hand-in-hand.

I'm studying Python and would like to do all of my projects with the same language

If I really had to pick one language for bioinformatics projects, I would go for R. However, I think it is a good investment to use both R and python as appropriate even if you find yourself juggling multiple languages. In the hand, you are pretty much bound to throw in some bash commands anyway.

ADD COMMENT
0
Entering edit mode
3 months ago
Gabriel • 0

In addition to the reasons mentioned above, i think it is more a cultural choice than a pratice one. In scientific community (at least in Brazil, where i speak from) we are very encouraged to use R to make our plots and statistic tests, so probably we take that culture to RNA-seq analysis.

ADD COMMENT
0
Entering edit mode
3 months ago
JustinZhang ▴ 120

General reason

Different from DNA-based analysis, bulk RNA-seq analysis is around the gene expression matrix. They used to handle this kind of data in MS Excel, SPSS, SAS.

With the contribution of R community, R became much more prosperous and popular than them. It's easy to use, fast and scalable. Ppl usually write the computing functions using C++ and package them as R pkgs.

The following scRNA-seq and omics data analysis are mainly developed on this basis.

For Python, omics data analysis is just a glimpse of its power.

Specific event

The publications of DESeq2, limma, edgeR, ggplot2, Seurat and some other well-known R pkgs strongly influenced the community.

ADD COMMENT
1
Entering edit mode

The question then is why DESeq2/limma/edgeR etc were R packages, rather than python packages.

The answer is that they were created by with a statistical background working in biology, for the use of people with a biology background rather than being created by people with a computer science/software background.

Stats people like R because it was written by stats people to do stats in. It has first class support for table data and statistical models. It has fast, built-in code for fitting statistical models. It thinks the way a statistician thinks.

It also has a long history of use by people who class themselves as being "non-coders". Long before there was literate coding and notebooks, typing at the REPL and evaluating the output (including plots) before deciding what to do next was a common way of working in R. This appeals to people wanting to think of it as a piece of software for analysing their data, rather than a language they have to "program" in. Finally R has a long history of all packages having very detailed step-by-step tutorials in the form of "vignettes" that have first class language support.

ADD REPLY
1
Entering edit mode

typing at the REPL and evaluating the output (including plots) before deciding what to do next was is! a common way of working in R")

:)

ADD REPLY
0
Entering edit mode

+1 I just write stuff in vim and copy it to the bare R terminal - no RStudio!

ADD REPLY

Login before adding your answer.

Traffic: 1540 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6