Is there a command to annotate genes from a txt file?
1
0
Entering edit mode
24 months ago
mt_pereira • 0

Hello!

I am doing the scanpy pipeline for scRNA quality control analysis: https://github.com/mousepixels/sanbomics_scripts/blob/main/Scanpy_intro_pp_clustering_markers.ipynb

For annotating the group of mitochondrial genes, in the pipeline they use the command:

adata.var['mt'] = adata.var_names.str.startswith('MT-')

where adata.var_names contains the gene names. However, in my dataset the mitochondrial genes do not have a starting pattern such as 'MT-', I have them all in a txt file which goes like the following:

mt-nd3
mt-nd4
mt-nd4l
mt-nd5
mt-nd6
NC_002333.1
NC_002333.10
NC_002333.11
NC_002333.12

My idea was to load the txt in the notebook:

x = open('michondrialgenesDR11.txt', 'r')
mitogenes = x.read()

and now assign these specific genes to adata.var['mt']. However, since they do not have the same start, I am not sure how to assign them all in the variable.

Can anyone help? Thanks a lot.

EDIT - code for when the mitocondrial genes all start with 'MT-' and annotate this group of genes as 'mt'

adata = sc.read_10x_mtx(
    'tutorial_sample/outs/filtered_feature_bc_matrix/',
    var_names='gene_symbols',
    cache=True)
adata.var['mt'] = adata.var_names.str.startswith('MT-')

The problem is for when not all the mitochondrial genes have the same beginning.

In the pipeline, adata.var_names contains the genes of a count matrix that has been converted into adata (AnnData object) which start with MT- (.str.startswith('MT-')). For me, has to contain all the genes that are in the txt file mentioned above.

Thank you!

scRNA python • 1.3k views
ADD COMMENT
1
Entering edit mode

Use 10101 edit button to format relevant text as code in future. I have done it for you this time.

ADD REPLY
1
Entering edit mode

If you take a look at https://stackoverflow.com/questions/20461847/str-startswith-with-a-list-of-strings-to-test-for so you can specify multiple strings so you could try

adata.var['mt'] = adata.var_names.str.startswith(('MT-', 'NC'))
ADD REPLY
0
Entering edit mode
ADD REPLY
1
Entering edit mode
24 months ago
ATpoint 86k

The most generic way would be to get the GTF file that this dataset annotation is based on and then filter for genes located on the mitochondrial chromosome. The mt- (or uppercase) prefix is convenient, but not "good" in terms of systematic, as gene names could also be replaced by gene id (ENS...) from Ensembl and then you cannot derive anything from prefixes. In you example there is lowercase mt- but you're using an uppercase query string.

ADD COMMENT

Login before adding your answer.

Traffic: 2773 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6