Entering edit mode
3.5 years ago
salvatore.raieli2
▴
90
Hi everyone,
I want to download data from NCBI, I want to do a similar query and to retrieve the data, in the way I can store in three different dataframe the phenotypical information, the expression matrix and clinical information.
I normally do this in R, but I want to do this in python.
# example R code I normally use
library("GEOquery")
GSE94499 <- getGEO('GSE94499', GSEMatrix=TRUE)
gse <- GSE94499$GSE94499_series_matrix.txt.gz
ch <- pData(gse) #retrieve clinical information
f <- fData(gse) #retrieve phenotypical information
y <-exprs(gse) #retrieve expression matrix
do you know any library to do this in python? Can you provide a sample code?
thank you very much for your help
Is there any specific reason for using python?
I am not sure whether a similar package is there or not but you can resolve this using rpy2 (a python interface for r).
actually yes, I have to insert in data pipeline which is in python. so it has to be a python code to insert in a function
There are ways to integrate an R script and a python script - snakemake is one of these ways. Have you explored it (or any other, even simpler method such as shell scripts)?
I would prefer not to integrate R script inside, the idea was to create a class in python to download and then doing some pre-processing. I tried to insert R script in other cases but it is quite slow (at least for my experience), I was thinking to limit at minimum the code length (and make as simple as possible) since this is a step in a project with other people (all pythonian)