First Line In Fasta Format
2
0
Entering edit mode
13.0 years ago
User 5037 ▴ 290

hi all. This is the first line from a fasta file.

gi|186681228|ref|YP_001864424.1| phycoerythrobilin:ferredoxin oxidoreductase

What do each word separated by | mean ?

fasta • 3.9k views
ADD COMMENT
9
Entering edit mode
13.0 years ago
Chris ★ 1.6k

Gene Identifier, RefSeq accession, name of the protein. An in-depth explanation could be found in NCBI's RefSeq handbook.

ADD COMMENT
1
Entering edit mode

gi: gene identifier. ref: refseq accession. Read Chris's answer again, please.

ADD REPLY
0
Entering edit mode

what does gi and ref mean ?

ADD REPLY
0
Entering edit mode

is it compulsory to say gi and ref in the first line ? Cant i just mention the gene identifier and RefSeq without writing gi and ref ?

ADD REPLY
0
Entering edit mode

I'm not sure I understand the motivation of your question. For your internal usage you can put into the fasta header whatever you want; there is no restriction. On the other hand, since RefSeq is a public resource for a broad audience of scientists with different backgrounds, the keywords gi and ref tell the kind of following identifier. Just using a number is ambiguous and it might not be obvious for some people that this references a genbank gene. As for the RefSeq id, the prefix YP_ already implies a RefSeq id. I guess, the 'ref' is there for consistency reasons.

ADD REPLY
0
Entering edit mode
13.0 years ago

The first line is a brief description of the sequence. It can contain a number of identifying fields that, as in the example you provide, could be separated by a standard symbol and parsed, or it can contain free text.

ADD COMMENT

Login before adding your answer.

Traffic: 2057 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6