Entering edit mode
11.1 years ago
amritayadav1991
▴
10
how to convert fasta sequence format into binary format (0-1) ???
how to convert fasta sequence format into binary format (0-1) ???
> xxd -g 0 -b file.fasta
0000000: 001111100110011101101001011111000011001000110010 >gi|22
0000006: 001101000011010100111000001110010011100000110000 458980
000000c: 001100000111110001110010011001010110011001111100 0|ref|
0000012: 010011100100001101011111001100000011000000110000 NC_000
> xxd -g 0 -b file.fasta | cut -d' ' -f2 -
001111100110011101101001011111000011001000110010
001101000011010100111000001110010011100000110000
001100000111110001110010011001010110011001111100
010011100100001101011111001100000011000000110000
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
what do you really want ? NCBI asn1tool, NCBI makeblastdb , UCSC fatotwobit , even gzip , etc... they all do that job
Given that any computer representation is a binary encoding, you don't even need to go that far. A fasta file is a binary encoding of a biological sequence, which has already been encoded in to a sequence of one letter codes representing each residue/base, with the addition of a little meta-data.
I would guess that what is really being looked for is details of a nucleotide sequence encoding, such as the GCG 2-bit encoding where the four basic DNA bases are represented in two bits:
00 = C, 01 = T, 10 = A, 11 = G
Various forms of this and other encodings providing compression are discussed in Which Dna Compression Algorithms Are Actually Used?.
for that matter, Microsoft word find/replace would do it...
The bash program is available at
Don't want to be really harsh but this is probably not a good method. For instance, printing out the text character '0' is 8 bits, it is not the same as a binary which would occupy 1 bit, so your script will double the size of any FASTA file simply by using two characters instead of one DNA letter. If you are just learning, it's interesting to practice this way but this would not be a recommended script