What is a FASTA format file ?
1
1
Entering edit mode
3.5 years ago
Student ▴ 30

Hello.

I am a bit confused about what is the meaning of having a FASTA format file. If I download the sequence of a genome from NCBI by choosing the options "Send to">"Coding Sequences">"Format:FASTA Nucleotide", I obtain a file in which I have the sequences of the genes and each one starts with a ">" and the description.

It should be a FASTA format file in this way, right ? But, actually, this file that I download is a txt file ... So how should I consider it ? The organization in the file is like a FASTA format file but the file is a txt file. What can I do to have a real FASTA format file ?

Sorry if my question could be trivial but I am pretty new in this field. Thank you in advance.

P.s. I am on Windows 10 and not on Linux.

Sequence NCBI FASTA Genome • 2.3k views
ADD COMMENT
1
Entering edit mode

Suffixes basically do not mean anything, as the file content and integrity in Unix is mostly (or always? idk) independent from the suffix. If it is formatted as fasta then you can use it as such, or simply change .txt suffix into .fa suffix. Sometimes NCBI makes you scratch your head, not sure why they provide this as txt.

ADD REPLY
1
Entering edit mode
3.5 years ago
GenoMax 147k

That is likely an oddity of your OS/browser. It added the ".txt" extension since it recognized the data as plain text which FASTA format data is. Using ChromeOS a FASTA file gets downloaded as sequence.fasta, no .txt added.

Here is a tip. If you get a save file as dialog then you can add double quotes around the file name (e.g. "sequence.fasta") and the file should be saved with that name without .txt extension.

ADD COMMENT
0
Entering edit mode

GenoMax I do not think it depends on the operating system or on the browser... I tried to do it again changing OS from Windows 10 to Ubuntu 20.04 and it still downloads as txt file . Also using Chrome. Maybe it is more due to NCBI as ATpoint suggested

ADD REPLY
0
Entering edit mode

A fasta format file is plain text. If you are double-clicking on the file name then it will open in your default text editor. If you have a program like SnapGene/DNASTAR etc installed on your computer then a fasta format file should be associated with that program and will open in that program as a DNA/RNA/Protein sequence.

ADD REPLY
1
Entering edit mode

Ok, I do not have those programs so I can not try but for example if in NCBI I select "Send to">"Complete Record">"File">"Format:FASTA" , it downloads a file that is actually a FASTA file (that is with the suffix .fasta) indipendentely from if I open with my text editor or something like this... Instead if I follow "Send to">"Coding Sequences">"Format:FASTA Nucleotide" (in order to have all the reads), I download a .txt file.

ADD REPLY
2
Entering edit mode

Ah I see. If one follows the path you mention then NCBI does indeed label the output file sequence.txt at source. My comments above were following a general observation that browsers sometimes will add .txt extension (e.g. making the file sequence.fasta.txt) to certain files that are downloaded.

So in this specific instance ATpoint is right. NCBI is sending the file with that extension. It is not technically wrong since fasta file is text but it also does not visually indicate that the file is in fasta format. Simply mv sequence.txt sequence.fasta to fix that.

ADD REPLY

Login before adding your answer.

Traffic: 1630 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6