GenBank to fasta sequence
2
1
Entering edit mode
9.2 years ago
Kumar ▴ 170

Hi,

I have a large file of Genbank format of nucleotide sequence, now I need fetch fasta sequence of all entries in file.

sequence • 8.2k views
ADD COMMENT
1
Entering edit mode
from Bio import SeqIO
SeqIO.convert(infile_genbank, "genbank", outfile, "fasta")

http://www2.warwick.ac.uk/fac/sci/moac/people/students/peter_cock/python/genbank2fasta/

ADD REPLY
0
Entering edit mode
ADD REPLY
6
Entering edit mode
9.2 years ago

something like:

$ curl -Ls "ftp://ftp.ncbi.nlm.nih.gov/genbank/gbenv80.seq.gz" | gunzip -c |\
awk '/^ACCESSION   / {printf(">%s\n",$2);next;} /^ORIGIN/ {inseq=1;next;} /^\/\// {inseq=0;} {if(inseq==0) next; gsub(/[0-9 ]/,"",$0); printf("%s\n",$0);}' |\
head -n 30

>KP304532
cgcggcctatcagcttgttggtgaggtaatggctcaccaaggcaacgacgggtagctggt
ctgagaggacgatcagccacactggaactgagacacggtccagactcctacgggaggcag
cagcagggaatcttgcgcaatgggcgaaagcctgacgcagcgacgccgcgtgggggatga
aggccttcgggttgtaaacccctttcaggagggaagaaaatgacggtacctccagaagaa
gccccggccaactacgtgccagcagccgcggtaatacgtagggggcgagcgttgtccgga
tttattgggcgtaaagggctcgtaggcggcttgacaagtcgatcgtgaaaactcagggct
caaccctgagacgccggtcgatactgtcatggctagggtccggtagaggagaatggaatt
cccggtgtagcggtgaaatgcgcagatatcgggaggaacaccagtagcgaaggcggtcct
ctgggccggtaccgacgctgaggagcgaaagcgtggggagcaaacaggattagataccct
ggtagtccacgccgtaaacgttgggtactaggtgtggcgtctttatcaacggatgccgtg
ccgaagctaacgcattaagtaccccgcctggggagtacgg
>KP304533
cgcggcctatcagcttgttggtggggtaacggcctaccaaggcatcgacgggtagctggt
ctgagaggacgatcagccacactgggactgagacacggcccagactcctacgggaggcag
cagtggggaatattgcgcaatgggcgaaagcctgacgcagcaacgccgcgtgggggatga
aggctttcgggttgtaaacccctttcagtgatgacgaaaatgacggtaatcacagaagaa
gccccggccaactacgtgccagcagccgcggtaacacgtagggggcgagcgttgtccgga
tttattgggcgtaaagagctcgtaggcggttgcgtaagtcggacgtgaaaactcagggct
caaccctgagatgccgttcgatactgcgctgactagagtccggtaggggagcatggaatt
cctggtgtagcggtgaaatgcgcagatatcaggaggaacaccagtggcgaaggcggtgct
ctgggccggaactgacgctgaggagcgaaagcatgggtagcaaacaggattagataccct
ggtagtccatgccgtaaacgttgggcactaggtgtgggacctacttaacgggttccgtgc
cgtagctaacgcattaagtgccccgcctggggagtacgg
>KP304534
cgcggcctatcagcttgttggtgaggtaacggctcaccaaggcatcgacgggtagctggt
ctgagaggacgatcagccacactgggactgagacacggcccagactcctacgggaggcag
cagtagggaatcttgcgcaatgggcgaaagcctgacgcagcaacgccgcgtgggggatga
aggccttcgggtcgtaaacccctttcagcagggacgaaaatgacggtacctgcagaagaa
ggtccggccaactacgtgccagcagccgcggtaatacgtagggaccaagcgttgtccgga
ADD COMMENT
0
Entering edit mode
ADD COMMENT
0
Entering edit mode

Because the use case is how to convert a large file into FASTA format, not how to get a sequence by accession number in Fasta format.

ADD REPLY

Login before adding your answer.

Traffic: 1802 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6