converting fasta files to genbank or embl format
1
1
Entering edit mode
9.9 years ago
samuel.medi ▴ 10

I have 7000 genes and their proteins as well as the genome of a bacteria am working on, I want to convert these files into either genebank format or embl, my problem is I don't have any scripting skills, I tried using an online tool (http://genome.nci.nih.gov/tools/reformat.html) but its out put appears to lack some information.

Does any one know a tool or a way I can convert these sequences?

format fasta genbank • 14k views
ADD COMMENT
2
Entering edit mode

I can develop this script for you. Give me original file format and final desired.

ADD REPLY
1
Entering edit mode

These are the files...

and this is how i want them to look like, NB-just the format

ADD REPLY
0
Entering edit mode

Final format should be similar to this: http://www.pseudomonas.com/downloads/pseudomonas/genbank/NC_002516.gbk

Original file is like this:- these are just examples of formats I have and the one I want to get -above

Protein fasta - http://www.pseudomonas.com/downloads/pseudomonas/fasta/Pseudomonas_aeruginosa_2192_uid54357.faa

Chromosome fasta - http://www.pseudomonas.com/downloads/pseudomonas/fasta/NC_002516.fna

Thank you for the quick response

ADD REPLY
0
Entering edit mode

Ok so let me see if I got your need,

You have a file, I did understand if you have protein or DNA or RNA.

You have you file in this format (protein)

### Amino acid sequences for Pseudomonas aeruginosa 2192 proteins. ### Last updated on 2011-04-11.

PA2G_00002|hypothetical protein[Pseudomonas aeruginosa 2192] MASPAFMRFLPRCGAAAAFGTLLGLAGCQSWLDDRYAD ....

PA2G_00002|hy .....

And want to convert to this format (DNA):

>PA2G_00002|hypothetical protein[Pseudomonas aeruginosa 2192]
TTTAAAGAGACCGGCGATTCTAGTGAAATCGAACGGGCAGGTCAATTTC
CAACCAGCGATGACGTAATAGATAGATACAAGGAAGTCATTTTTCTTTTA
AAGGATAGAAACGGTTAATGCTCTTGGGACGGCGCTTTTCT

That means:

You want convert protein to DNA? Is it ?

Or

  1. you want cut all lines whit hash tag #
  2. Clear spaces between lines
  3. and format 70 columns
ADD REPLY
1
Entering edit mode

please, don't add a new answer but update your 1st answer.

ADD REPLY
0
Entering edit mode

I think you didn't get the question: Samuel needs to map the data DNA/protein vs a whole genome (using E.g: BLAST) and generate a genbank file from the output.

ADD REPLY
0
Entering edit mode

please, don't add a new answer but update your 1st answer.

Ok Pierre. I think Samuel need something like this http://genome.nci.nih.gov/cgi-bin/gau/reformat. Only conversion. If the need is to comparing do y think parwise could solve? http://www.ebi.ac.uk/Tools/psa/genewise/

ADD REPLY
2
Entering edit mode
9.9 years ago
Brice Sarver ★ 3.8k

No need to develop tools to do this. Many are publicly available for such a common task.

The standard for many years has been Emboss' SeqRet. An online version is here, but I would consider installing the suite if this is something you need to do often. The command line version is as simple as seqret <in> <out>. Wrap it in a loop.

BioPython's SeqIO module can also do this, albeit with a bit more (basic) programming. I'm sure there are equivalents in BioPerl, BioRuby, and via Bioconductor for R.

This kind of task is day one bioinformatics, and the skills required are easy to learn and very straightforward. It's as simple as navigating to a folder and running a program, possibly within the simplest of loops depending on how your data is organized. You are already converting 7000 sequences; it would make sense that you learn about the plethora of resources available to you developed over the past couple decades. You'll also save a ton of time in the future!

ADD COMMENT
0
Entering edit mode

Sounds like the way to go, get dirty with biopython/python, although I tried the Emboss' seqRet too, its output was rejected when I tried to identify genomic islands using http://www.pathogenomics.sfu.ca/islandviewer/genome_submit.php.

ADD REPLY

Login before adding your answer.

Traffic: 1964 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6