Not Python but using EntrezDirect you can get:
$ esearch -db bioproject -query "GSE118723" | esummary | xtract -pattern DocumentSummary -element Project_Description
Quantification of gene expression levels at the single cell level has revealed that gene expression can vary substantially even across a population of homogeneous cells. However, it is currently unclear what genomic features control variation in gene expression levels, and whether common genetic variants may impact gene expression variation. Here, we take a genome-wide approach to identify expression variance quantitative trait loci (vQTLs). To this end, we generated single cell RNA-seq (scRNA-seq) data from induced pluripotent stem cells (iPSCs) derived from 53 Yoruba individuals. We collected data for a median of 95 cells per individual and a total of 5,447 single cells, and identified 241 mean expression QTLs (eQTLs) at 10% FDR, of which 82% replicate in bulk RNA-seq data from the same individuals. We further identified 14 vQTLs at 10% FDR, but demonstrate that these can also be explained as effects on mean expression. Our study suggests that dispersion QTLs (dQTLs), which could alter the variance of expression independently of the mean, have systematically smaller effect sizes than eQTLs. We estimate that at least 300 cells per individual and 400 individuals would be required to have modest power to detect the strongest dQTLs in iPSCs. These results will guide the design of future studies on understanding the genetic control of gene expression variance. Overall design: The goal of our study was to identify quantitative trait loci associated with gene expression variance across cells (vQTLs). Using the Fluidigm C1 platform, we isolated and collected scRNA-seq from 7,585 single cells from induced pluripotent stem cell (iPSC) lines of 54 Yoruba in Ibadan, Nigeria (YRI) individuals. We used unique molecular identifiers (UMIs) to tag RNA molecules and account for amplification bias in the single cell data (Islam et al., 2014). To estimate technical confounding effects without requiring separate technical replicates, we used a mixed-individual plate study design. The key idea of this approach is that having observations from the same individual under different confounding effects and observations from different individuals under the same confounding effect allows us to distinguish the two sources of variation (Tung et al., 2017).
As for the samples you can do something like:
$ esearch -db bioproject -query "GSE118723" | elink -target biosample | efetch | head -20
1: 11032017-C12-NA19226
Identifiers: BioSample: SAMN09855354; SRA: SRS3686681; GEO: GSM3341993
Organism: Homo sapiens
Attributes:
/source name="LCL-derived iPSC"
/experiment="11032017"
/well="C12"
/individual="NA19226"
/batch="b4"
Accession: SAMN09855354 ID: 9855354
2: 11062017-E01-NA19099
Identifiers: BioSample: SAMN09858071; SRA: SRS3689110; GEO: GSM3342102
Organism: Homo sapiens
Attributes:
/source name="LCL-derived iPSC"
/experiment="11062017"
/well="E01"
/individual="NA19099"
/batch="b4"
Thank you very much ! Is there any way to run it from Jupiter like environment ? It seems it can be installed But then : "/bin/bash: esearch: command not found"
See e.g. https://www.kaggle.com/alexandervc/entrezdirect
That may simply be a
$PATH
problem. Find out whereesearch
(and other programs are) and then add that directory to your$PATH
(export PATH=$PATH:/dir_with_entrezdirect_progs
). You can also install usingconda
.Thank you very much ! Indeed it seems problem with PATH, however it seems I cannot resolve it . I am trying !export PATH=/root/edirect/:$PATH
But it does not seems to change PATH. https://www.kaggle.com/alexandervc/entrezdirect?scriptVersionId=70892045&cellId=17
Conda install also does not seems to work properly on kaggle https://www.kaggle.com/alexandervc/entrezdirect?scriptVersionId=70892045&cellId=3
Is
/root/edirect
an actual directory? Are you able to dols -l /root/edirect
and get a listing? I assume!
is because of the kaggle env that you are using but the command looks correct and it is not modifying the$PATH
. Not sure if you needsudo
access to make changes.It seems '/root/edirect/' is directory, Both !ls /root/edirect/ and os.listdir('/root/edirect/') work:
https://www.kaggle.com/alexandervc/entrezdirect?scriptVersionId=70900026&cellId=13
It looks like each command in that pipe is being executed in a separate shell and that is why you are losing the
$PATH
setting for those sub shells. Can you find a proper unix shell somewhere for this?Thank you ! "each command in that pipe is being executed in a separate shell " - very interesing idea, never could imagine it. Do not you know is it the same in Jupiter notebooks ? Colab ? "Can you find a proper unix shell somewhere for this?" any suggestion ?
If it is seprate shell that it is indeed for each commant, not even for particular notebook-cell, since putting there commands to one notebook cell, does not change anything: https://www.kaggle.com/alexandervc/entrezdirect?scriptVersionId=70919686&cellId=22
Separate shell is NOT desired for each command since the output of first command is being passed through second and then third. I just speculated that your kaggle environment is possibly using a sub-shell based on the error you are seeing.
I was referring to a unix shell that you can get via a proper unix machine, a virtual machine running linux/unix or even Windows Subsystem for Linux (WSL) on Windows.
Finally, it seems works even on kaggle ! There is a way to change PATH on kaggle which works:
os.environ['PATH'] = "/kaggle/working:" + os.environ['PATH']
So here is example how to use entrezdirect
https://www.kaggle.com/alexandervc/entrezdirect?scriptVersionId=71645515