Entering edit mode
3.8 years ago
jbt38
•
0
I have multiple Genbank files in one file, and wish to extract all the gene sequences for alignment.
I.e. if one Genbank file (for genus species1) contains gene1 & gene2, and another file (for genus species2) contains gene1 and gene8, I'm looking for an output of:
>gene1 genus species1
agtc...
>gene2 genus species1
agtc...
>gene1 genus species2
agtc...
>gene8 genus species2
agtc...
I have found many solutions which are alomost what I want but not quite e.g. the python code here https://www.researchgate.net/post/How_can_I_parse_a_GenBank_file_to_retrieve_specific_gene_sequences_with_IDs. But this returns the protein sequence, and you have to specify the gene exactly.
Please post example input and expected output from example input. In the mean time, try using following code this is the code for both nucleotide and aa. Edit the code as per your convenience:
@cpad0112, Please help me to use your script. I mean I have used the command to execute your code as shown below (I have changed my
genbank
file name astest.gb
as you mentioned in your code and named your python script asscri.py
),but end up with error as follows,
After format, please add
(
. It is missing. @ KumarThank you @ cpad0112 I have revised (added
(
after format) your script as you mentioned, however I am getting another error as followswithout example input file, it is difficult to trouble shoot the issue. I tested the script locally with https://www.ncbi.nlm.nih.gov/nuccore/MK153192.1 gb file and both aa and nt sequences were written to hard disk.
See if following simple code works out:
Please add ( after format.
Print should print the NT sequence in flattened format and seqIO write should print out fasta in system paging format. Both print and seqIO would print the same information.
Input example of only one Genbank files. Imagine there are loads. These are in a single file:
Output. I added another species to show what I need but had to observe the character limit:
I will sort these later and align them for phylogenetic analysis