Hi, I need a list of all the protein coding genes from mouse with gene name and corresponding Entrez Gene ID. can somebody help.
thanks
Hi, I need a list of all the protein coding genes from mouse with gene name and corresponding Entrez Gene ID. can somebody help.
thanks
1) simple way:
Example, Mus musculus
For other organism/species use codes from following link,
http://rest.kegg.jp/list/organism
2) With some effort using UCSC (Select Mus musculus as reference in your case):
A: I need to download a list of all human genes with their respective Esemble gene
the most up-to-date and comprehensive reference for Entrez Gene ID, which also contains the status of the genes, is:
http://ftp.ncbi.nlm.nih.gov/gene/DATA/gene_info.gz (which you have to filter by taxon ID 10090 for Mus muscululs)
or, conveniently : http://ftp.ncbi.nlm.nih.gov/gene/DATA/GENE_INFO/Mammalia/Mus_musculus.gene_info.gz
The table also contains the gene name (or more correctly: gene names separated by naming institution), and the gene-type (which you have to filter for "protein-coding")
Thanks all,
actually I have a RNA-seq data set and I would like to remove all the non-coding genes. is there any available tool for that?
Strategy for R:
Assuming you have a count table with genes on rows and samples on columns you can easily read this table in as a data.frame counts
. Analogous you can read in the list of genes you want to keep (or remove, doesn't matter) into a vector keepgenes
.
Then you can easily slice the counts data.frame by something like newdata <- counts[counts$gene %in% keepgenes, ]
If this strategy/pseudocode is unclear for you I can expand on this, but this is really basic R and you'll make your RNA-seq analysis far less painful if you dive in a R tutorial and learn yourself how to perform these tasks. Knowledge of at least one programming language is an enormous advantage.
Hi
You can easily use Ensembl Biomart
http://www.ensembl.org/biomart/martview/3e6bde1e77a85c663750a6367619f66f Regards
Esmaeil
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
This is a dumb question but I just wanted to double check, this is independent of reference genome (mm9, mm10), correct?
Hi Steve,
good point, yes I am using mm10