refseq/description/comments summary field download
1
0
Entering edit mode
7.4 years ago
theoharis ▴ 40

Hi

At ftp://ftp.ncbi.nih.gov/refseq/H_sapiens/RefSeqGene/ there are 22 files REFSEQGENE.N.GENOMIC.GBTF (where N is 1 to 22) which contain the refseq comments, i.e.

LOCUS       NG_054889             120855 bp    DNA     linear   PRI 13-JUN-2017
DEFINITION  Homo sapiens cytoplasmic FMR1 interacting protein 1 (CYFIP1),
            RefSeqGene on chromosome 15.
ACCESSION   NG_054889
VERSION     NG_054889.1
KEYWORDS    RefSeq; RefSeqGene.
SOURCE      Homo sapiens (human)
  ORGANISM  Homo sapiens
            Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
            Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini;
            Catarrhini; Hominidae; Homo.
COMMENT     REVIEWED REFSEQ: This record has been curated by NCBI staff. The
            reference sequence was derived from AC090764.35, KF456021.1 and
            AC011767.12.
            This sequence is a reference standard in the RefSeqGene project.

            Summary: This gene encodes a protein that regulates cytoskeletal
            dynamics and protein translation. The encoded protein is a
            component of the WAVE regulatory complex (WRC), which promotes
            actin polymerization.This protein also interacts with the Fragile X
            mental retardation protein (FMRP) and translation initiation factor
            4E to inhibit protein translation. A large chromosomal deletion
            including this gene is associated with increased risk of
            schizophrenia and epilepsy in human patients. Reduced expression of
            this gene has been observed in various human cancers and the
            encoded protein may inhibit tumor invasion. [provided by RefSeq,
            May 2017].

My question is:

  • is it possible to obtain a single file of all the 22 files and

  • is it possible to have them formatted in a table (without a script)

Thanks

gene genome • 1.8k views
ADD COMMENT
1
Entering edit mode

is it possible to obtain a single file of all the 22 files and

seq 1 22 | while read F; do wget -O - "ftp://ftp.ncbi.nih.gov/refseq/H_sapiens/RefSeqGene/refseqgene.${F}.genomic.gbff.gz" | gunzip -c ; done > out.gbff

is it possible to have them formatted in a table (without a script)

without a script ? no

ADD REPLY
0
Entering edit mode

is there a script you can suggest?

ADD REPLY
0
Entering edit mode

sure:

 wget -O - "ftp://ftp.ncbi.nih.gov/refseq/H_sapiens/RefSeqGene/refseqgene.1.genomic.gbff.gz" | gunzip -c | awk 'BEGIN{ok=0;} /^(ACCESSION|COMMENT) / {ok=1;printf("%s ",$0);ok=1;next;} /^            / {if(ok==1) printf("%s ",substr($0,13));next;}  {if(ok==1) printf("\n"); ok=0;}'
ADD REPLY
0
Entering edit mode
7.4 years ago
theoharis ▴ 40

elegant, thank you

(for mac users this is what I had to do to get wget

ADD COMMENT

Login before adding your answer.

Traffic: 1669 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6