Entering edit mode
2.6 years ago
field654
▴
30
Dear folks,
I wonder if there's any platform that allows me to manipulate NGS data by coding.
I used to apply NGS to verify some experiments like
Hybridoma IgG sequencing Quantify error-prone PCR error occurence
where I could simply feed a tiny portion of the data into R and manipulate by coding.
While it saved me much time from reading package manuals, the limitation is that R is usually single-threaded and weak in handling large data set.
I wonder if there's any R counterpart that's more suitable for NGS data analysis.
Thank you so much.
FIeld
Not entirely true. I think the
R
interpreter itself is still single threaded, but there are plenty of packages that make use of multithreading and the multiple cores most modern machines have thanks to having backends written inC
/C++
(e.g.,data.table
). See here and here for discussions. This page might also be helpful.The data set size issues are more a memory issue rather than a
R
issue. This can be alleviated by chunking data, increasing available memory, or doing both. Alternatively you could use something like thebigmemory
package or interface with a proper database likeSQL
.If you're really insistent on using another language, I guess the only viable alternatives would be
Julia
orpython
(orperl
if you are really old school), unless you want to attempt data manipulation inrust
orC
.Dear Sir. Thank you very much for your advice. I've looked into multi-thread calculation. However, I soon ran into problem. Basically, I could create multiple clusters but the CPUs refuse to handle them in parallel. Rather, one got processed while others were waiting. A more detailed question was posted somewhere else, where I thought being more appropriate for computer questions. I seek your help to maybe share some advice. Many thanks. Field
https://stackoverflow.com/questions/72186970/rstudio-refuse-to-employ-more-cores-not-about-code
Much of NGS data analysis is CLI driven and the frameworks are available in multiple languages (python, groovy etc). R is mostly used for statistics and graphing, which are in general end of the analysis steps. You can also buy commercial statistics software such S-plus, SPSS, SAS for better performance. R has a fork, MRAN which uses multiple cores.