Identification of Human Genes ?
3
3
Entering edit mode
5.1 years ago
sp29 ▴ 50

I have a set of gene symbols ~10,000. The data has some noise in the form of the wrong gene symbols(Genes not belonging to human). I need to filter out only those, that belongs to the human genome. What could be the possible library of python or R which could help me in doing so?

R gene python human genes • 1.1k views
ADD COMMENT
3
Entering edit mode
5.1 years ago
caggtaagtat ★ 1.9k

If your pc is not restricted by a firewall, you can use the R package biomart to access the gene names listed in ensembl.org for humans. Here is the link for the manual

ADD COMMENT
3
Entering edit mode
5.1 years ago

You can do this in any scripting language. Download the list of approved human gene symbols from the HGNC web site and filter out of your list all symbols that don't match the HGNC ones.

ADD COMMENT
1
Entering edit mode

You can go to GENCODE and download an annotated file of a human gene and intersect his gene with the gene in the annotated file using the R function intersect () or go to ensemble Biomart and convert the ID here is the website of Biomart

ADD COMMENT

Login before adding your answer.

Traffic: 2711 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6