gene list for RNA seq
4
0
Entering edit mode
8.4 years ago
pshubhamoy ▴ 20

Hi, I need a list of all the protein coding genes from mouse with gene name and corresponding Entrez Gene ID. can somebody help.

thanks

RNA-Seq • 2.5k views
ADD COMMENT
1
Entering edit mode
8.4 years ago
EagleEye 7.6k

1) simple way:

Example, Mus musculus

http://rest.kegg.jp/list/mmu

For other organism/species use codes from following link,

http://rest.kegg.jp/list/organism

2) With some effort using UCSC (Select Mus musculus as reference in your case):

A: I need to download a list of all human genes with their respective Esemble gene

ADD COMMENT
0
Entering edit mode

This is a dumb question but I just wanted to double check, this is independent of reference genome (mm9, mm10), correct?

ADD REPLY
0
Entering edit mode

Hi Steve,

good point, yes I am using mm10

ADD REPLY
1
Entering edit mode
8.4 years ago
unksci ▴ 180

the most up-to-date and comprehensive reference for Entrez Gene ID, which also contains the status of the genes, is:

http://ftp.ncbi.nlm.nih.gov/gene/DATA/gene_info.gz (which you have to filter by taxon ID 10090 for Mus muscululs)

or, conveniently : http://ftp.ncbi.nlm.nih.gov/gene/DATA/GENE_INFO/Mammalia/Mus_musculus.gene_info.gz

The table also contains the gene name (or more correctly: gene names separated by naming institution), and the gene-type (which you have to filter for "protein-coding")

ADD COMMENT
0
Entering edit mode
8.4 years ago
pshubhamoy ▴ 20

Thanks all,

actually I have a RNA-seq data set and I would like to remove all the non-coding genes. is there any available tool for that?

ADD COMMENT
0
Entering edit mode

This is something that you could accomplish fairly easily in any programming language e.g. R or Python. However the specific implementation would be heavily dependent on the structure of your data

ADD REPLY
0
Entering edit mode

Strategy for R:

Assuming you have a count table with genes on rows and samples on columns you can easily read this table in as a data.frame counts. Analogous you can read in the list of genes you want to keep (or remove, doesn't matter) into a vector keepgenes.

Then you can easily slice the counts data.frame by something like newdata <- counts[counts$gene %in% keepgenes, ]

If this strategy/pseudocode is unclear for you I can expand on this, but this is really basic R and you'll make your RNA-seq analysis far less painful if you dive in a R tutorial and learn yourself how to perform these tasks. Knowledge of at least one programming language is an enormous advantage.

ADD REPLY
0
Entering edit mode
8.4 years ago
ebrahimiet ▴ 50

Hi

You can easily use Ensembl Biomart

http://www.ensembl.org/biomart/martview/3e6bde1e77a85c663750a6367619f66f Regards

Esmaeil

ADD COMMENT

Login before adding your answer.

Traffic: 1975 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6