how to download dataset from NCBI dataset in python
0
0
Entering edit mode
3.5 years ago

Hi everyone,

I want to download data from NCBI, I want to do a similar query and to retrieve the data, in the way I can store in three different dataframe the phenotypical information, the expression matrix and clinical information.

I normally do this in R, but I want to do this in python.

# example R code I normally use
library("GEOquery")
GSE94499 <- getGEO('GSE94499', GSEMatrix=TRUE)
gse <- GSE94499$GSE94499_series_matrix.txt.gz
ch <- pData(gse) #retrieve clinical information
f <- fData(gse) #retrieve phenotypical information
y <-exprs(gse) #retrieve expression matrix

do you know any library to do this in python? Can you provide a sample code?

thank you very much for your help

python ncbi geodataset • 1.4k views
ADD COMMENT
0
Entering edit mode

Is there any specific reason for using python?

I am not sure whether a similar package is there or not but you can resolve this using rpy2 (a python interface for r).

ADD REPLY
0
Entering edit mode

actually yes, I have to insert in data pipeline which is in python. so it has to be a python code to insert in a function

ADD REPLY
0
Entering edit mode

There are ways to integrate an R script and a python script - snakemake is one of these ways. Have you explored it (or any other, even simpler method such as shell scripts)?

ADD REPLY
0
Entering edit mode

I would prefer not to integrate R script inside, the idea was to create a class in python to download and then doing some pre-processing. I tried to insert R script in other cases but it is quite slow (at least for my experience), I was thinking to limit at minimum the code length (and make as simple as possible) since this is a step in a project with other people (all pythonian)

ADD REPLY

Login before adding your answer.

Traffic: 1935 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6