NCBI gene Dabase info extration
2
0
Entering edit mode
5.1 years ago
Shahzad ▴ 30

Hi All I want to extract some information for about 2000 genes from NCBI gene database. i have GI and official gene symbols of these genes. https://imgur.com/A8ARi2U I need the gene ontology info in text form if possible. Is there any way to do it or any other database which can be used to collect this information for Arabidopsis plant.

NCBI Gene Databe gene ontology • 1.8k views
ADD COMMENT
2
Entering edit mode
5.1 years ago

Hi,

it appears that the ontology annotation is for Arabidopsis provided by TAIR. You can download their database of GO terms.

Look at https://www.arabidopsis.org/download_files/GO_and_PO_Annotations/Gene_Ontology_Annotations

The file ATH_GO_GOSLIM.txt might be of interest for you. It contains the GO annotations in plain text.

ADD COMMENT
2
Entering edit mode
5.1 years ago
vkkodali_ncbi ★ 3.8k

The file you are looking for is gene2go.gz from the NCBI Gene FTP path: https://ftp.ncbi.nlm.nih.gov/gene/DATA/ It has Gene Ontology data in tabular format with the following columns:

  #tax_id [  1]: 3702
   GeneID [  2]: 814629
    GO_ID [  3]: GO:0005634
 Evidence [  4]: ISM
Qualifier [  5]: -
  GO_term [  6]: nucleus
   PubMed [  7]: -
 Category [  8]: Component

Additional information about the data included in this file, including explanation of the field names and the update frequency, can be found here: https://ftp.ncbi.nlm.nih.gov/gene/DATA/README

You can use the following awk command to filter Arabidopsis data from that table:

zcat gene2go.gz | awk 'BEGIN{FS="\t";OFS="\t"}($1==3702)' > arabidopsis_gene2go.tsv
ADD COMMENT

Login before adding your answer.

Traffic: 1709 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6