Hi, I have a large GenBank file that contains multiple records. I got this file after performing BLAST against my query sequence. This .gb file contains all the results (full record in GenBank format for each accession) from the blast search. I want to extract a certain CDS from each record in that .gb file. As additional complexity, this CDS is a join of two regions in the genome. Is there any automated process by using code in the Linux terminal, so that I can extract my target CDS from all records and save them in a separate file? Thanks in advance.
Please post example gb file and expected output.
Here I don't find any option to upload the GenBank file. But it is very long file, so do I just copy-paste it here?
Post one or two records as input and expected output for those one or two records.
Okay, according to your suggestion I cut off some portion of these genbank records. Here, there are two genbank record in a single file:
And from this file I want to extract a cds suppose ORF4 (as fasta format) from every record. The output would be as follow (for character limitation I deleted some sequence from the fasta files):
try this code:
Thank you for your code. I run the code according to your instruction. But, it generating all amino acid sequences for every CDS. But my target is to get DNA sequence of a specific CDS (lets say, ORF4).
For nucleotide sequence, please try this:
Replace '(' with (
Hey, I am trying this code but here I getting the translated sequence of all the genes. But I want a translated sequence of a particular gene only from the GB file. Can you help me out with this?
Hey Gen did you figure out your problem I am having the same issue