What is the best way to get a set of all of the genes for a genome via entrez?
1
What is the best way to get a set of all of the genes for a genome via entrez?
I tried using the term "GCF_009858895[AACC]" for the "gene" database, but got no hits.
entrez
• 933 views
•
link
updated 4.2 years ago by
GenoMax
147k
•
written 4.2 years ago by
DNAlias
▴
40
Using Entrezdirect :
A. If you just need names
$ esearch -db assembly -query "GCF_009858895" | elink -target genome | elink -target gene | esummary | xtract -pattern DocumentSummary -element Name,Description
ORF1ab ORF1a polyprotein;ORF1ab polyprotein
S surface glycoprotein
N nucleocapsid phosphoprotein
ORF7a ORF7a protein
ORF6 ORF6 protein
ORF3a ORF3a protein
ORF7b ORF7b
ORF10 ORF10 protein
M membrane glycoprotein
E envelope protein
ORF8 ORF8 protein
B. If you want additional information
$ esearch -db assembly -query "GCF_009858895" | elink -target genome | elink -target gene | efetch -format tabular
tax_id Org_name GeneID CurrentID Status Symbol Aliases description other_designations map_location chromosome genomic_nucleotide_accession.versionstart_position_on_the_genomic_accession end_position_on_the_genomic_accession orientation exon_count OMIM
2697049 Severe acute respiratory syndrome coronavirus 2 43740578 0 live ORF1ab GU280_gp01 ORF1a polyprotein;ORF1ab polyprotein ORF1a polyprotein;ORF1ab polyprotein NC_045512.2 266 21555 plus 0
2697049 Severe acute respiratory syndrome coronavirus 2 43740568 0 live S GU280_gp02, spike glycoprotein surface glycoprotein surface glycoprotein NC_045512.2 21563 25384 plus 0
2697049 Severe acute respiratory syndrome coronavirus 2 43740575 0 live N GU280_gp10 nucleocapsid phosphoprotein nucleocapsid phosphoprotein NC_045512.2 28274 29533 plus 0
2697049 Severe acute respiratory syndrome coronavirus 2 43740573 0 live ORF7a GU280_gp07 ORF7a protein ORF7a protein NC_045512.2 2739427759 plus 0
2697049 Severe acute respiratory syndrome coronavirus 2 43740572 0 live ORF6 GU280_gp06 ORF6 protein ORF6 protein NC_045512.2 2720227387 plus 0
2697049 Severe acute respiratory syndrome coronavirus 2 43740569 0 live ORF3a GU280_gp03 ORF3a protein ORF3a protein NC_045512.2 2539326220 plus 0
2697049 Severe acute respiratory syndrome coronavirus 2 43740574 0 live ORF7b GU280_gp08 ORF7b ORF7b NC_045512.2 27756 27887 plus0
2697049 Severe acute respiratory syndrome coronavirus 2 43740576 0 live ORF10 GU280_gp11 ORF10 protein ORF10 protein NC_045512.2 2955829674 plus 0
2697049 Severe acute respiratory syndrome coronavirus 2 43740571 0 live M GU280_gp05 membrane glycoprotein membrane glycoprotein NC_045512.2 26523 27191 plus 0
2697049 Severe acute respiratory syndrome coronavirus 2 43740570 0 live E GU280_gp04 envelope protein envelope protein NC_045512.2 26245 26472 plus 0
2697049 Severe acute respiratory syndrome coronavirus 2 43740577 0 live ORF8 GU280_gp09 ORF8 protein ORF8 protein NC_045512.2 2789428259 plus 0
Login before adding your answer.
Traffic: 2634 users visited in the last hour
So does this mean that you are extracting the genes from the genome entry instead of the assembly entry? There are some assemblies that don't seem to have links to genomes.
In this case the accession you provided is for an assembly. So linking it to a genome and then get the genes worked. If assemblies don't have genome links and/or have no annotation then you can't get the gene names for those.