Entering edit mode
2.1 years ago
Emily
▴
70
Is there a way to process nuclear localised vs cytosol localised genes from anndata?
The scanpy tutorial shows how to do it for mitochondrial ones using adata.var['mt'] = adata.var.index.str.startswith('MT-')
and for ribosomes I did
ribo_url = "http://software.broadinstitute.org/gsea/msigdb/download_geneset.jsp?geneSetName=KEGG_RIBOSOME&fileType=txt"
ribo_genes = pd.read_table(ribo_url, skiprows=2, header = None)
ribo_genes
adata.var['ribo'] = adata.var_names.isin(ribo_genes[0].values)
but not sure how I can go about separating nuclear vs cytosolic...
I think, you have to clarify here what you mean by nuclear localised vs cytosol localised genes. Do you mean the localization of the genes itself or the localization of the transcribed and potentially translated gene products? And what organism(s) are you working with?
Because if I interpret your code snippets correctly, you are aiming for the gene products and many mitochondrial proteins are actually transcribed in the nucleus, so e.g. your approach to just select those that originate from the mitochondrial DNA falls short.
Using Gene Ontology for annotation might give you an idea, since provide a set of hierarchical controlled vocabulary split into 3 categories: Biological process, Molecular function, Cellular component including evidence codes.