Is There A Single Database Where I Can Find All The Human Gene Accession And Sequeunce?
2
1
Entering edit mode
13.2 years ago
Firoz ▴ 10

I am looking for All Human gene sequence, specifically coding region. Is there any database from where I can download all human gene sequences? I know i can go to NCBI or EMBL and search for individual gene/s but what i needed is a single flat file which contains all refseq gene of Human OR single query which can retrieve all human gene from any database.

human gene • 3.9k views
ADD COMMENT
0
Entering edit mode

Thanks a lot guys. Found both NCBI and Ensembl. But dont quite understand why they name different number of refSeq in these two database. NCBI gives you 4423 sequences, whereas the Ensembl more than 19000. Also found another archive http://www.genenames.org/cgi-bin/hgnc_stats.pl which I gues is the repository of Annoted Gene. That number is consistent with Ensembl. Any suggestion which one to use?

ADD REPLY
3
Entering edit mode
13.2 years ago
User 3869 ▴ 100

You can use NCBI FTP for RefSeq. The fasta file of all genes is available.

ADD COMMENT
0
Entering edit mode

Thanks a lot guys. Found both NCBI and Ensembl. But dont quite understand why they name different number of refSeq in these two database. NCBI gives you 4423 sequences, whereas the Ensembl more than 19000. Also found another archive genenames.org/cgi-bin/hgnc_stats.pl which I gues is the repository of Annoted Gene. That number is consistent with Ensembl. Any suggestion which one to use?

ADD REPLY
0
Entering edit mode

There are several big institutes, e.g., Ensembl, NCBI, and UCSC, maintaining their own gene annotations. They all have pros and cons.

The RefSeq gene annotation is proposed and maintained by NCBI. If you would like to use refseg genes as you mentioned in your question, you should use the NCBI FTP. If you want more comprehensive (and noisier) annotation, try Ensembl.

ADD REPLY
3
Entering edit mode
13.2 years ago
Bert Overduin ★ 3.7k

See the Ensembl FTP site.

ADD COMMENT
0
Entering edit mode

Thanks a lot guys. Found both NCBI and Ensembl. But dont quite understand why they name different number of refSeq in these two database. NCBI gives you 4423 sequences, whereas the Ensembl more than 19000. Also found another archive genenames.org/cgi-bin/hgnc_stats.pl which I gues is the repository of Annoted Gene. That number is consistent with Ensembl. Any suggestion which one to use?

ADD REPLY
0
Entering edit mode

I don't know where you get the number of 4423, but that number simply cannot be right. I am pretty sure that there are RefSeqs for the majority of human protein-coding genes, so I would at least expect a number around the 20,000.

ADD REPLY

Login before adding your answer.

Traffic: 2737 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6