Database to map gene ID to the chromosome where it is located?
3
1
Entering edit mode
9.6 years ago

I am looking for a database, preferentially with a .csv or .txt dump that would be able to convert gene accession handles (Name, Gene Id, UNIPROT IDs, EMBL accession numbers, ...) to the chromosomes they are assigned in their specific organism.

This data is usually shown in web rendering of the Uniprot proteins, but is absent from the original .txt data dump as far as I know.

gene • 3.6k views
ADD COMMENT
1
Entering edit mode
9.6 years ago

One way to do it is to grab the archive of the gene annotations from your source of choice with wget or curl, filter the result for genes with awk, and then convert to BED and awk to get the fourth and first columns (ID and chromosome values), e.g.,:

​$ wget -qO- ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_21/gencode.v21.annotation.gff3.gz \
    | gunzip --stdout - \
    | awk '$3=="gene"' - \
    | convert2bed -i gff - \
    | awk '{print "$4\t$1";}' - \
    > gene_id_and_chromosome.txt
ADD COMMENT
0
Entering edit mode

Thank you for your answer! I see there is a way to do the same thing for the mouse thanks to the same resource. Is there a way to retrieve the mapping for Saccharomyces Cerevisiae?

ADD REPLY
1
Entering edit mode

Take a look at the GFF or GTF files in the archives here, maybe this will help: http://downloads.yeastgenome.org/sequence/S288C_reference/genome_releases/?C=M;O=D

ADD REPLY
1
Entering edit mode

Andrei! Hello from Seattle! I'm looking for a way to find the chromosome location for the uniprot.dat file. Any chance you know where I can find that?

ADD REPLY
0
Entering edit mode

Hello Summer, hope you are doing well there! Cf my answer, hope it helps.

ADD REPLY
0
Entering edit mode
9.6 years ago
TriS ★ 4.7k

bioDBnet does what you want

ADD COMMENT
0
Entering edit mode

Could you be a little bit more explicit?

ADD REPLY
0
Entering edit mode
9.6 years ago

It seems that Uniprot has "Proteomes" object some of which actually map to the chromosomes.

In Uniprot, this is available under the DR; EMBL; BKXXXXX; ... ;. .. ;... field (for yeast, CMXXXX for humans).

A mapping from BKXXX (CMXXX) references seems to be obtainable manually from the Uniprot proteome links.

It seems that the mapping is also readily available in BioConductor among the gene location mapping tools too.

ADD COMMENT

Login before adding your answer.

Traffic: 2078 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6