Hello all,
I wanted to discuss a question with you that I've been asking myself during a project. In this project, I had to read.table()
a great number of files from a large number of groups, which were provided to us by multiple research groups.
As the data was provided in multiple formats that differed significantly, I wrote multiple 'reading scripts'. This allowed me to experiment with both scripts based upon looping and scripts based upon functions.
Loop-based scripts consisted mostly of one big loop, which performed many alterations for one measurement at a time. Function-based scripts took entire lists or arrays of measurements and applied functions to these using the apply
family of functions. This meant that a small alteration was performed over a large amount of measurements at a time.
Now, comparing these two types of scripts, I feel like the loop-based ones are more intuitive to read than the function-based ones. You see what happens to a measurement, step by step. This in contrast to the function-based scripts, which are repeatedly broken up by apply
. This might be relevant for sharing the scripts with colleagues that are not bioinformaticians.
I would like to note that I know that functions and the apply
family are supposed to run faster than loops, but this is not as important for me as readability. One thing which could be a perk of using functions is the possibility of debugging in Rstudio.
I would like to hear your views!
Could you clarify, what is meant by "loops" and "functions" ?
Imperative R is gross! :)
The short answer is:
If the speed penalty isn't an issue, take readability (if its that high a priority).
If the speed is necessary, document/comment your code more thoroughly.