Mighty Bioinformaticians,
I have a text file (txt format) with roughly 350 GenBank accession numbers (each representing a specific 16S sequence) and I want to extract the title of the corresponding paper ("TITLE" in genbank files) and the isolation source ("/isolation_source" under "FEATURE" in gb records) for each accession number.
Input example:
HM789874
JN225528
Desired output:
TITLE, /isolation_source of HM789874
TITLE, /isolation_source of JN225528
Now I have searched previous questions and found all sorts of useful information on how I might be able to solve this problem with BioPerl or BioPython modules, but as I am a biologist and not yet a bioinformatician, I find it a bit overwhelming and fail to compile all the potentially useful tricks and tips into a script or command. FYI, I am familiar with Perl/BioPerl (beginner), but I still have to make my first steps in Python.
Any script or help would be greatly appreciated,
Sam
A shell one liner:
OUTPUT: