How To Convert A Fasta Sequence Format Into Binary Format (0-1)?
1
1
Entering edit mode
11.1 years ago

how to convert fasta sequence format into binary format (0-1) ???

• 8.7k views
ADD COMMENT
3
Entering edit mode

what do you really want ? NCBI asn1tool, NCBI makeblastdb , UCSC fatotwobit , even gzip , etc... they all do that job

ADD REPLY
1
Entering edit mode

Given that any computer representation is a binary encoding, you don't even need to go that far. A fasta file is a binary encoding of a biological sequence, which has already been encoded in to a sequence of one letter codes representing each residue/base, with the addition of a little meta-data.

I would guess that what is really being looked for is details of a nucleotide sequence encoding, such as the GCG 2-bit encoding where the four basic DNA bases are represented in two bits:

00 = C, 01 = T, 10 = A, 11 = G

Various forms of this and other encodings providing compression are discussed in Which Dna Compression Algorithms Are Actually Used?.

ADD REPLY
0
Entering edit mode

for that matter, Microsoft word find/replace would do it...

ADD REPLY
0
Entering edit mode
ADD REPLY
2
Entering edit mode

Don't want to be really harsh but this is probably not a good method. For instance, printing out the text character '0' is 8 bits, it is not the same as a binary which would occupy 1 bit, so your script will double the size of any FASTA file simply by using two characters instead of one DNA letter. If you are just learning, it's interesting to practice this way but this would not be a recommended script

ADD REPLY
2
Entering edit mode
11.1 years ago
> xxd -g 0 -b file.fasta
0000000: 001111100110011101101001011111000011001000110010  >gi|22
0000006: 001101000011010100111000001110010011100000110000  458980
000000c: 001100000111110001110010011001010110011001111100  0|ref|
0000012: 010011100100001101011111001100000011000000110000  NC_000

> xxd -g 0 -b file.fasta | cut -d' ' -f2 -
001111100110011101101001011111000011001000110010
001101000011010100111000001110010011100000110000
001100000111110001110010011001010110011001111100
010011100100001101011111001100000011000000110000
ADD COMMENT
0
Entering edit mode

:-) .

ADD REPLY

Login before adding your answer.

Traffic: 1700 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6