Question

Opening A Fasta File In Windows

0

Entering edit mode

13.1 years ago

Vivek • 0

Hi all,

I am a beginner with Blast+.I am using Windows.My aim as of now is to download the nr protein sequence in Fasta format and then format it using makeblastdb.then extract the first 1000 characters from the nr file as a seperate file (say qa.fasta) and then query it against the whole database.

Now i downloaded the nr database in Fasta format from this link

ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/nr.gz (are these the original fasta files??)

then i used to makeblastdb command like this

makeblastdb -in nr -dbtype prot -out outnr -> This resulted in the nr file to be split into different parts nr.00 to nr.03.(Is this normal).

Now i need help to extract the first 1000 char from nr file.But how to i open a Fasta file in windows??? How do i proceed??

fasta blast makeblastdb • 52k views

ADD COMMENT • link updated 13.1 years ago by Vivek • 0 • written 13.1 years ago by Vivek • 0

0

Entering edit mode

Why do you need the first 1000 char? Why did you put bioperl in the tags?

ADD REPLY • link 13.1 years ago by Manu Prestat 4.1k

0

Entering edit mode

I've removed the bioperl tag.

ADD REPLY • link 13.1 years ago by Neilfws 49k

score 2 · Answer 1 · 2012-03-06

2

Entering edit mode

13.1 years ago

Geparada ★ 1.5k

fasta are plain texts files, you can open with notepad or even word.

If you'll often do this kind of stuff, you should use unix. The life is too short to use windows.

ADD COMMENT • link 13.1 years ago by Geparada ★ 1.5k

1

Entering edit mode

In the long term switching to using a UNIX style system may make sense. However there is a learning curve to take into account... I suggest trying a biology targeted Linux distribution, see http://en.wikipedia.org/wiki/BioLinux, in a virtual machine, for example using VirtualBox (https://www.virtualbox.org/) as a starting point.

ADD REPLY • link 13.1 years ago by Hamish ★ 3.3k

score 2 · Answer 2 · 2012-03-06

2

Entering edit mode

13.1 years ago

Manu Prestat 4.1k

Hi, first, I'm not sure "original" is the good term, but if you mean: "do these fasta files correspond exactly to the official nr db sequences?" the answer is yes. Second, the fact the db files are splitted is a normal behavior. Nevertheless, I have a doubt the db building process worked until the end: personally, I 've never tried on nr but NCBI provides the nr ready-to-go blastdb that iterates until nr.05. . Do you have the alias file (nr.pal) created? Finally, as Geparada told you, fasta files are text files. So open it with any text editor (better than processor BTW, you don't want any grammar correction, or a Times New Roman font for ids and Arial Italic for sequences, and more importantly, you want to save your first 1000 aa as text, not doc, rtf... ). The difficulty is actually not the type of file, but the size. I've never tried on windows, but a former coworker used Notepad++ and seemed to be happy with this one.

ADD COMMENT • link 13.1 years ago by Manu Prestat 4.1k

0

Entering edit mode

The 'nr' BLAST database from NCBI contains additional information not present in the fasta sequence format data, since it is generated from the ASN.1. In order to ensure maximum compatibility it is likely a smaller part size is also used by NCBI, this avaoids problems with some filesystems. So it isn't surprising that a manual generation would give fewer parts.

ADD REPLY • link 13.1 years ago by Hamish ★ 3.3k

0

Entering edit mode

See http://en.wikipedia.org/wiki/List_of_text_editors for a list of text editors, many of which are available for MS Windows. You may find reading http://en.wikipedia.org/wiki/Text_editor helpful since it contains a definition of a text editor.

ADD REPLY • link 13.1 years ago by Hamish ★ 3.3k

score 1 · Answer 3 · 2012-03-06

1

Entering edit mode

13.1 years ago

Swbarnes2 ★ 1.6k

If you want to stick with Windows, use gvim, or something like it for Windows. It's more powerful than a Notepad, it has no problem handling very large text files (and I think it's easier on the eyes than Notepad)

ADD COMMENT • link 13.1 years ago by Swbarnes2 ★ 1.6k

0

Entering edit mode

+1. And also Windows/OSX native text editors all treat some characters (whitespace) a bit differently. Linebreak is 'n' in unix, but r in osx for example.

ADD REPLY • link 13.1 years ago by Damian Kao 16k

score 0 · Answer 4 · 2012-03-06

I did not get why you didn't directly downloaded the preformatted databases from ncbi in the first place? You can blast against it directly and literally get any info from it using the provided utilities. Even on winhoo$.

At best try to use an editor that can handle line-endings conversion (they are different for windhoos en unix and some tools will fail with incorrect line endings. Not all windows-2-unix convert these accuratly. I personally prefere notepad++ where you can interconvert line endings as well).

score 0 · Answer 5 · 2012-03-06

0

Entering edit mode

13.1 years ago

Biomonika (Noolean) 3.2k

When opening large fasta files, I have been more than satisfied with JWrite. All other editors used to crash from time to time, especially when handling really large datasets.

ADD COMMENT • link 13.1 years ago by Biomonika (Noolean) 3.2k

score 0 · Answer 6 · 2012-03-08

Hi all,

Thanks for the replies.Apologies for being late to get back.

I am working on a research project with my professor.Thats y i downloaded the fasta files as i was asked to do so :)

The file is too big to be opened by windows (by any editor) and hence i need to extract the first 1000 chars just to take one sequence so that i can do a blast using a test query.

Manu Prestat - Yes i have the nr.pal file created.