Refseq (.Fna) Vs. (.Gbff) Files
1
2
Entering edit mode
10.7 years ago
Truxton ▴ 20

When I download transcript data from Refseq via their FTP site (ftp://ftp.ncbi.nlm.nih.gov/refseq/), I noticed that there are two file types: .gbff & .fna. Is it correct to assume that the .gbff (Gene Bank Flat... I believe) file contains EXACTLY the same sequence information as the .fna file (FASTA format sequences) in the same order, except that the .fna file has only short one-line descriptions for the sequences?

Also, what are the possible last 'words' in the ">..." title for each sequence in the .fna file? I've seen for example 'mRNA' and 'ncRNA', and so forth. Is there a fixed and standardized list by chance?

sequence database • 12k views
ADD COMMENT
0
Entering edit mode
10.6 years ago
Pablacious ▴ 630

The > line in a fasta file is only divided in the identifier part and the description part. The identifier part goes between the > and the first space. Whatever goes after the first space is the description part (which can have all the spaces that you want), so there is no such a thing as a last word. The description is optional, the identifier not. Yes, I would say that the sequence should be the same, gff has only more "structured" meta data.

ADD COMMENT

Login before adding your answer.

Traffic: 2781 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6