I have 7000 genes and their proteins as well as the genome of a bacteria am working on, I want to convert these files into either genebank format or embl, my problem is I don't have any scripting skills, I tried using an online tool (http://genome.nci.nih.gov/tools/reformat.html) but its out put appears to lack some information.
Does any one know a tool or a way I can convert these sequences?
I think you didn't get the question: Samuel needs to map the data DNA/protein vs a whole genome (using E.g: BLAST) and generate a genbank file from the output.
No need to develop tools to do this. Many are publicly available for such a common task.
The standard for many years has been Emboss' SeqRet. An online version is here, but I would consider installing the suite if this is something you need to do often. The command line version is as simple as seqret <in> <out>. Wrap it in a loop.
BioPython's SeqIO module can also do this, albeit with a bit more (basic) programming. I'm sure there are equivalents in BioPerl, BioRuby, and via Bioconductor for R.
This kind of task is day one bioinformatics, and the skills required are easy to learn and very straightforward. It's as simple as navigating to a folder and running a program, possibly within the simplest of loops depending on how your data is organized. You are already converting 7000 sequences; it would make sense that you learn about the plethora of resources available to you developed over the past couple decades. You'll also save a ton of time in the future!
I can develop this script for you. Give me original file format and final desired.
These are the files...
and this is how i want them to look like, NB-just the format
Final format should be similar to this: http://www.pseudomonas.com/downloads/pseudomonas/genbank/NC_002516.gbk
Original file is like this:- these are just examples of formats I have and the one I want to get -above
Protein fasta - http://www.pseudomonas.com/downloads/pseudomonas/fasta/Pseudomonas_aeruginosa_2192_uid54357.faa
Chromosome fasta - http://www.pseudomonas.com/downloads/pseudomonas/fasta/NC_002516.fna
Thank you for the quick response
Ok so let me see if I got your need,
You have a file, I did understand if you have protein or DNA or RNA.
You have you file in this format (protein)
### Amino acid sequences for Pseudomonas aeruginosa 2192 proteins. ### Last updated on 2011-04-11.
And want to convert to this format (DNA):
That means:
You want convert protein to DNA? Is it ?
Or
#
please, don't add a new answer but update your 1st answer.
I think you didn't get the question: Samuel needs to map the data DNA/protein vs a whole genome (using E.g: BLAST) and generate a genbank file from the output.
Ok Pierre. I think Samuel need something like this http://genome.nci.nih.gov/cgi-bin/gau/reformat. Only conversion. If the need is to comparing do y think parwise could solve? http://www.ebi.ac.uk/Tools/psa/genewise/