Hello!
I am doing the scanpy pipeline for scRNA quality control analysis: https://github.com/mousepixels/sanbomics_scripts/blob/main/Scanpy_intro_pp_clustering_markers.ipynb
For annotating the group of mitochondrial genes, in the pipeline they use the command:
adata.var['mt'] = adata.var_names.str.startswith('MT-')
where adata.var_names contains the gene names. However, in my dataset the mitochondrial genes do not have a starting pattern such as 'MT-', I have them all in a txt file which goes like the following:
mt-nd3
mt-nd4
mt-nd4l
mt-nd5
mt-nd6
NC_002333.1
NC_002333.10
NC_002333.11
NC_002333.12
My idea was to load the txt in the notebook:
x = open('michondrialgenesDR11.txt', 'r')
mitogenes = x.read()
and now assign these specific genes to adata.var['mt']
. However, since they do not have the same start, I am not sure how to assign them all in the variable.
Can anyone help? Thanks a lot.
EDIT - code for when the mitocondrial genes all start with 'MT-' and annotate this group of genes as 'mt'
adata = sc.read_10x_mtx(
'tutorial_sample/outs/filtered_feature_bc_matrix/',
var_names='gene_symbols',
cache=True)
adata.var['mt'] = adata.var_names.str.startswith('MT-')
The problem is for when not all the mitochondrial genes have the same beginning.
In the pipeline, adata.var_names
contains the genes of a count matrix that has been converted into adata
(AnnData
object) which start with MT-
(.str.startswith('MT-')
). For me, has to contain all the genes that are in the txt file mentioned above.
Thank you!
Use
10101
edit button to format relevant text ascode
in future. I have done it for you this time.If you take a look at https://stackoverflow.com/questions/20461847/str-startswith-with-a-list-of-strings-to-test-for so you can specify multiple strings so you could try
cross-posted: https://stackoverflow.com/questions/75172257/